Skip to content

zezhishao/DailyArXiv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daily Papers

The project automatically fetches the latest papers from arXiv based on keywords.

The subheadings in the README file represent the search keywords.

Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.

You can click the 'Watch' button to receive daily email notifications.

Last update: 2025-12-23

Time Series

Title Date Abstract Comment
Day-Ahead Electricity Price Forecasting Using Merit-Order Curves Time Series 2025-12-19
Show

We introduce a general, simple, and computationally efficient framework for predicting day-ahead supply and demand merit-order curves, from which both point and probabilistic electricity price forecasts can be derived. Specifically, we leverage functional principal component analysis to efficiently represent a pair of supply and demand curves in a low-dimensional vector space and employ regularized vector autoregressive models for their prediction. We conduct a rigorous empirical comparison of price forecasting performance between the proposed curve-based model, i.e., derived from predicted merit-order curves, and state-of-the-art price-based models that directly forecast the clearing price, using data from the Italian day-ahead market over the 2023-2024 period. Our results show that the proposed curve-based approach significantly improves both point and probabilistic price forecasting accuracy relative to price-based approaches, with average gains of approximately 5%, and improvements of up to 10% during mid-day hours, when prices occasionally drop due to high renewable generation and low demand.

Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification 2025-12-19
Show

Time series classification plays a fundamental role in a wide range of real-world applications. Recently, large language models (LLMs) have demonstrated strong generalization and reasoning capacities, but directly applying them to time series classification remains non-trivial due to the representation gap between numerical sequences and linguistic semantics. In this paper, we propose HiTime, a hierarchical LLM-based framework for multimodal time series classification that bridges structured temporal representations with semantic reasoning in a generative paradigm. Specifically, we design a hierarchical sequence feature encoding module composed of a data-specific encoder and a task-specific encoder to extract complementary temporal features. To mitigate the embedding gap between time series representations and textual semantics, we further introduce a semantic space alignment module that jointly performs coarse-grained global modeling and fine-grained cross-modal correspondence. Building upon the above representations, we employ a parameter-efficient supervised fine-tuning strategy to activate the generative classification capability of the algined LLMs, thereby transforming conventional discriminative time series classification into a generative task. Extensive experiments on multiple benchmarks demonstrate that the proposed framework consistently outperforms state-of-the-art baselines. The code is publicly available at https://github.com/Xiaoyu-Tao/HiTime.

Bayesian Markov-Switching Partial Reduced-Rank Regression 2025-12-19
Show

Reduced-Rank (RR) regression is a powerful dimensionality reduction technique but it overlooks any possible group configuration among the responses by assuming a low-rank structure on the entire coefficient matrix. Moreover, the temporal change of the relations between predictors and responses in time series induce a possibly time-varying grouping structure in the responses. To address these limitations, a Bayesian Markov-switching partial RR (MS-PRR) model is proposed, where the response vector is partitioned in two groups to reflect different complexity of the relationship. A \textit{simple} group assumes a low-rank linear regression, while a \textit{complex} group exploits nonparametric regression via a Gaussian Process. Differently from traditional approaches, group assignments and rank are treated as unknown parameters to be estimated. Then temporal persistence in the regression function is accounted for by a Markov-switching process that drives the changes in the grouping structure and model parameters over time. Full Bayesian inference is preformed via a partially collapsed Gibbs sampler, which allows uncertainty quantification without the need for trans-dimensional moves. Applications to two real-world macroeconomic and commodity data demonstrate the evidence of time-varying grouping and different degrees of complexity both across states and within each state.

28 pa...

28 pages; Supplement available upon request to the author

A lightweight Spatial-Temporal Graph Neural Network for Long-term Time Series Forecasting 2025-12-19
Show

We propose Lite-STGNN, a lightweight spatial-temporal graph neural network for long-term multivariate forecasting that integrates decomposition-based temporal modeling with learnable sparse graph structure. The temporal module applies trend-seasonal decomposition, while the spatial module performs message passing with low-rank Top-$K$ adjacency learning and conservative horizon-wise gating, enabling spatial corrections that enhance a strong linear baseline. Lite-STGNN achieves state-of-the-art accuracy on four benchmark datasets for horizons up to 720 steps, while being parameter-efficient and substantially faster to train than transformer-based methods. Ablation studies show that the spatial module yields 4.6% improvement over the temporal baseline, Top-$K$ enhances locality by 3.3%, and learned adjacency matrices reveal domain-specific interaction dynamics. Lite-STGNN thus offers a compact, interpretable, and efficient framework for long-term multivariate time series forecasting.

9 pag...

9 pages, 5 figures, 2 tables. Accepted for presentation at the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026), Marbella, Spain

Human-like Content Analysis for Generative AI with Language-Grounded Sparse Encoders 2025-12-19
Show

The rapid development of generative AI has transformed content creation, communication, and human development. However, this technology raises profound concerns in high-stakes domains, demanding rigorous methods to analyze and evaluate AI-generated content. While existing analytic methods often treat images as indivisible wholes, real-world AI failures generally manifest as specific visual patterns that can evade holistic detection and suit more granular and decomposed analysis. Here we introduce a content analysis tool, Language-Grounded Sparse Encoders (LanSE), which decompose images into interpretable visual patterns with natural language descriptions. Utilizing interpretability modules and large multimodal models, LanSE can automatically identify visual patterns within data modalities. Our method discovers more than 5,000 visual patterns with 93% human agreement, provides decomposed evaluation outperforming existing methods, establishes the first systematic evaluation of physical plausibility, and extends to medical imaging settings. Our method's capability to extract language-grounded patterns can be naturally adapted to numerous fields, including biology and geography, as well as other data modalities such as protein structures and time series, thereby advancing content analysis for generative AI.

Towards Causal Market Simulators 2025-12-19
Show

Market generators using deep generative models have shown promise for synthetic financial data generation, but existing approaches lack causal reasoning capabilities essential for counterfactual analysis and risk assessment. We propose a Time-series Neural Causal Model VAE (TNCM-VAE) that combines variational autoencoders with structural causal models to generate counterfactual financial time series while preserving both temporal dependencies and causal relationships. Our approach enforces causal constraints through directed acyclic graphs in the decoder architecture and employs the causal Wasserstein distance for training. We validate our method on synthetic autoregressive models inspired by the Ornstein-Uhlenbeck process, demonstrating superior performance in counterfactual probability estimation with L1 distances as low as 0.03-0.10 compared to ground truth. The model enables financial stress testing, scenario analysis, and enhanced backtesting by generating plausible counterfactual market trajectories that respect underlying causal mechanisms.

ICAIF...

ICAIF 2025 Workshop on Rethinking Financial Time-Series

STDiff: A State Transition Diffusion Framework for Time Series Imputation in Industrial Systems 2025-12-19
Show

Incomplete sensor data is a major obstacle in industrial time-series analytics. In wastewater treatment plants (WWTPs), key sensors show long, irregular gaps caused by fouling, maintenance, and outages. We introduce STDiff and STDiff-W, diffusion-based imputers that cast gap filling as state-space simulation under partial observability, where targets, controls, and exogenous signals may all be intermittently missing. STDiff learns a one-step transition model conditioned on observed values and masks, while STDiff-W extends this with a context encoder that jointly inpaints contiguous blocks, combining long-range consistency with short-term detail. On two WWTP datasets (one with synthetic block gaps from Agtrup and another with natural outages from Avedøre), STDiff-W achieves state-of-the-art accuracy compared with strong neural baselines such as SAITS, BRITS, and CSDI. Beyond point-error metrics, its reconstructions preserve realistic dynamics including oscillations, spikes, and regime shifts, and they achieve top or tied-top downstream one-step forecasting performance compared with strong neural baselines, indicating that preserving dynamics does not come at the expense of predictive utility. Ablation studies that drop, shuffle, or add noise to control or exogenous inputs consistently degrade NH4 and PO4 performance, with the largest deterioration observed when exogenous signals are removed, showing that the model captures meaningful dependencies. We conclude with practical guidance for deployment: evaluate performance beyond MAE using task-oriented and visual checks, include exogenous drivers, and balance computational cost against robustness to structured outages.

Peer-...

Peer-reviewed and published in Expert Systems with Applications, Volume 302 (2026). This version reflects the published article

SportsGPT: An LLM-driven Framework for Interpretable Sports Motion Assessment and Training Guidance 2025-12-19
Show

Existing intelligent sports analysis systems mainly focus on "scoring and visualization," often lacking automatic performance diagnosis and interpretable training guidance. Recent advances in Large Language Models (LLMs) and motion analysis techniques provide new opportunities to address the above limitations. In this paper, we propose SportsGPT, an LLM-driven framework for interpretable sports motion assessment and training guidance, which establishes a closed loop from motion time-series input to professional training guidance. First, given a set of high-quality target models, we introduce MotionDTW, a two-stage time series alignment algorithm designed for accurate keyframe extraction from skeleton-based motion sequences. Subsequently, we design a Knowledge-based Interpretable Sports Motion Assessment Model (KISMAM) to obtain a set of interpretable assessment metrics (e.g., insufficient extension) by contrasting the keyframes with the target models. Finally, we propose SportsRAG, a RAG-based training guidance model built upon Qwen3. Leveraging a 6B-token knowledge base, it prompts the LLM to generate professional training guidance by retrieving domain-specific QA pairs. Experimental results demonstrate that MotionDTW significantly outperforms traditional methods with lower temporal error and higher IoU scores. Furthermore, ablation studies validate the KISMAM and SportsRAG, confirming that SportsGPT surpasses general LLMs in diagnostic accuracy and professionalism.

Electric Vehicle Charging Load Forecasting: An Experimental Comparison of Machine Learning Methods 2025-12-19
Show

With the growing popularity of electric vehicles as a means of addressing climate change, concerns have emerged regarding their impact on electric grid management. As a result, predicting EV charging demand has become a timely and important research problem. While substantial research has addressed energy load forecasting in transportation, relatively few studies systematically compare multiple forecasting methods across different temporal horizons and spatial aggregation levels in diverse urban settings. This work investigates the effectiveness of five time series forecasting models, ranging from traditional statistical approaches to machine learning and deep learning methods. Forecasting performance is evaluated for short-, mid-, and long-term horizons (on the order of minutes, hours, and days, respectively), and across spatial scales ranging from individual charging stations to regional and city-level aggregations. The analysis is conducted on four publicly available real-world datasets, with results reported independently for each dataset. To the best of our knowledge, this is the first work to systematically evaluate EV charging demand forecasting across such a wide range of temporal horizons and spatial aggregation levels using multiple real-world datasets.

18 pa...

18 pages, 2 figures, 5 tables

A Structure-Preserving Assessment of VBPBB for Time Series Imputation Under Periodic Trends, Noise, and Missingness Mechanisms 2025-12-19
Show

Incomplete time-series data compromise statistical inference, particularly when the underlying process exhibits periodic structure (e.g., annual or monthly cycles). Conventional imputation procedures rarely account for such temporal dependence, leading to attenuation of seasonal signals and biased estimates. This study proposes and evaluates a structure-preserving multiple imputation framework that augments imputation models with frequency-specific covariates derived via the Variable Bandpass Periodic Block Bootstrap (VBPBB). In controlled simulations, we generate series with annual and monthly components, impose Gaussian noise across low, moderate, and high signal-to-noise regimes, and introduce Missing Completely at Random (MCAR) patterns from 5% to 70% missingness. Dominant periodic components are extracted with VBPBB, resampled to stabilize uncertainty, and incorporated as covariates in Amelia II. Compared with baseline methods that do not model temporal structure, the VBPBB-enhanced approach consistently yields lower imputation error and superior retention of periodic features, with the largest gains observed under high noise and when multiple components are included. These findings demonstrate that explicitly modeling periodic content during imputation improves reconstruction accuracy and preserves time-series structure in the presence of substantial missingness.

24 pa...

24 pages, 6 figure and 3 tables

Modelling financial time series with $φ^{4}$ quantum field theory 2025-12-19
Show

We use a $φ^{4}$ quantum field theory with inhomogeneous couplings and explicit symmetry-breaking to model an ensemble of financial time series from the S$&$P 500 index. The continuum nature of the $φ^4$ theory avoids the inaccuracies that occur in Ising-based models which require a discretization of the time series. We demonstrate this using the example of the 2008 global financial crisis. The $φ^{4}$ quantum field theory is expressive enough to reproduce the higher-order statistics such as the market kurtosis, which can serve as an indicator of possible market shocks. Accurate reproduction of high kurtosis is absent in binarized models. Therefore Ising models, despite being widely employed in econophysics, are incapable of fully representing empirical financial data, a limitation not present in the generalization of the $φ^{4}$ scalar field theory. We then investigate the scaling properties of the $φ^{4}$ machine learning algorithm and extract exponents which govern the behavior of the learned couplings (or weights and biases in ML language) in relation to the number of stocks in the model. Finally, we use our model to forecast the price changes of the AAPL, MSFT, and NVDA stocks. We conclude by discussing how the $φ^{4}$ scalar field theory could be used to build investment strategies and the possible intuitions that the QFT operations of dimensional compactification and renormalization can provide for financial modelling.

UniCoMTE: A Universal Counterfactual Framework for Explaining Time-Series Classifiers on ECG Data 2025-12-18
Show

Machine learning models, particularly deep neural networks, have demonstrated strong performance in classifying complex time series data. However, their black-box nature limits trust and adoption, especially in high-stakes domains such as healthcare. To address this challenge, we introduce UniCoMTE, a model-agnostic framework for generating counterfactual explanations for multivariate time series classifiers. The framework identifies temporal features that most heavily influence a model's prediction by modifying the input sample and assessing its impact on the model's prediction. UniCoMTE is compatible with a wide range of model architectures and operates directly on raw time series inputs. In this study, we evaluate UniCoMTE's explanations on a time series ECG classifier. We quantify explanation quality by comparing our explanations' comprehensibility to comprehensibility of established techniques (LIME and SHAP) and assessing their generalizability to similar samples. Furthermore, clinical utility is assessed through a questionnaire completed by medical experts who review counterfactual explanations presented alongside original ECG samples. Results show that our approach produces concise, stable, and human-aligned explanations that outperform existing methods in both clarity and applicability. By linking model predictions to meaningful signal patterns, the framework advances the interpretability of deep learning models for real-world time series applications.

21 pages, 7 figures
On the Identification of Temporally Causal Representation with Instantaneous Dependence 2025-12-18
Show

Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observations, which are in general difficult to obtain in real-world scenarios. To fill this gap, we propose an \textbf{ID}entification framework for instantane\textbf{O}us \textbf{L}atent dynamics (\textbf{IDOL}) by imposing a sparse influence constraint that the latent causal processes have sparse time-delayed and instantaneous relations. Specifically, we establish identifiability results of the latent causal process based on sufficient variability and the sparse influence constraint by employing contextual information of time series data. Based on these theories, we incorporate a temporally variational inference architecture to estimate the latent variables and a gradient-based sparsity regularization to identify the latent causal process. Experimental results on simulation datasets illustrate that our method can identify the latent causal process. Furthermore, evaluations on multiple human motion forecasting benchmarks with instantaneous dependencies indicate the effectiveness of our method in real-world settings.

Bots Don't Sit Still: A Longitudinal Study of Bot Behaviour Change, Temporal Drift, and Feature-Structure Evolution 2025-12-18
Show

Social bots are now deeply embedded in online platforms for promotion, persuasion, and manipulation. Most bot-detection systems still treat behavioural features as static, implicitly assuming bots behave stationarily over time. We test that assumption for promotional Twitter bots, analysing change in both individual behavioural signals and the relationships between them. Using 2,615 promotional bot accounts and 2.8M tweets, we build yearly time series for ten content-based meta-features. Augmented Dickey-Fuller and KPSS tests plus linear trends show all ten are non-stationary: nine increase over time, while language diversity declines slightly. Stratifying by activation generation and account age reveals systematic differences: second-generation bots are most active and link-heavy; short-lived bots show intense, repetitive activity with heavy hashtag/URL use; long-lived bots are less active but more linguistically diverse and use emojis more variably. We then analyse co-occurrence across generations using 18 interpretable binary features spanning actions, topic similarity, URLs, hashtags, sentiment, emojis, and media (153 pairs). Chi-square tests indicate almost all pairs are dependent. Spearman correlations shift in strength and sometimes polarity: many links (e.g. multiple hashtags with media; sentiment with URLs) strengthen, while others flip from weakly positive to weakly or moderately negative. Later generations show more structured combinations of cues. Taken together, these studies provide evidence that promotional social bots adapt over time at both the level of individual meta-features and the level of feature interdependencies, with direct implications for the design and evaluation of bot-detection systems trained on historical behavioural features.

TimeSenCLIP: A Time Series Vision-Language Model for Remote Sensing Using Single-Pixel 2025-12-18
Show

Vision-language models (VLMs) have shown significant promise in remote sensing applications, particularly for land-use and land-cover (LULC) mapping via zero-shot classification and retrieval. However, current approaches face several key challenges, such as the dependence on caption-based supervision, which is often not available or very limited in terms of the covered semantics, and the fact of being adapted from generic VLM architectures that are suitable for very high resolution images. Consequently, these models tend to prioritize spatial context over spectral and temporal information, limiting their effectiveness for medium-resolution remote sensing imagery. In this work, we present TimeSenCLIP, a lightweight VLM for remote sensing time series, using a cross-view temporal contrastive framework to align multispectral Sentinel-2 time series with geo-tagged ground-level imagery, without requiring textual annotations. Unlike prior VLMs, TimeSenCLIP emphasizes temporal and spectral signals over spatial context, investigating whether single-pixel time series contain sufficient information for solving a variety of tasks.

Paper under review
FORMSpoT: A Decade of Tree-Level, Country-Scale Forest Monitoring 2025-12-18
Show

The recent decline of the European forest carbon sink highlights the need for spatially explicit and frequently updated forest monitoring tools. Yet, existing satellite-based disturbance products remain too coarse to detect changes at the scale of individual trees, typically below 100 m$^{2}$. Here, we introduce FORMSpoT (Forest Mapping with SPOT Time series), a decade-long (2014-2024) nationwide mapping of forest canopy height at 1.5 m resolution, together with annual disturbance polygons (FORMSpoT-$Δ$) covering mainland France. Canopy heights were derived from annual SPOT-6/7 composites using a hierarchical transformer model (PVTv2) trained on high-resolution airborne laser scanning (ALS) data. To enable robust change detection across heterogeneous acquisitions, we developed a dedicated post-processing pipeline combining co-registration and spatio-temporal total variation denoising. Validation against ALS revisits across 19 sites and 5,087 National Forest Inventory plots shows that FORMSpoT-$Δ$ substantially outperforms existing disturbance products. In mountainous forests, where disturbances are small and spatially fragmented, FORMSpoT-$Δ$ achieves an F1-score of 0.44, representing an order of magnitude higher than existing benchmarks. By enabling tree-level monitoring of forest dynamics at national scale, FORMSpoT-$Δ$ provides a unique tool to analyze management practices, detect early signals of forest decline, and better quantify carbon losses from subtle disturbances such as thinning or selective logging. These results underscore the critical importance of sustaining very high-resolution satellite missions like SPOT and open-data initiatives such as DINAMIS for monitoring forests under climate change.

KarmaTS: A Universal Simulation Platform for Multivariate Time Series with Functional Causal Dynamics 2025-12-18
Show

We introduce KarmaTS, an interactive framework for constructing lag-indexed, executable spatiotemporal causal graphical models for multivariate time series (MTS) simulation. Motivated by the challenge of access-restricted physiological data, KarmaTS generates synthetic MTS with known causal dynamics and augments real-world datasets with expert knowledge. The system constructs a discrete-time structural causal process (DSCP) by combining expert knowledge and algorithmic proposals in a mixed-initiative, human-in-the-loop workflow. The resulting DSCP supports simulation and causal interventions, including those under user-specified distribution shifts. KarmaTS handles mixed variable types, contemporaneous and lagged edges, and modular edge functionals ranging from parameterizable templates to neural network models. Together, these features enable flexible validation and benchmarking of causal discovery algorithms through expert-informed simulation.

Exponentially weighted estimands and the exponential family: filtering, prediction and smoothing 2025-12-18
Show

We propose using a discounted version of a convex combination of the log-likelihood with the corresponding expected log-likelihood such that when they are maximized they yield a filter, predictor and smoother for time series. This paper then focuses on working out the implications of this in the case of the canonical exponential family. The results are simple exact filters, predictors and smoothers with linear recursions. A theory for these models is developed and the models are illustrated on simulated and real data.

SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting 2025-12-18
Show

Chronic Obstructive Pulmonary Disease (COPD), a major chronic respiratory disease with persistent airflow limitation, is a leading global cause of disability and mortality. Respiratory spirogram time series, routinely collected during pulmonary function tests (PFTs), play a critical role in the early detection of repsiratory diseases and in monitoring lung function over time. However, most current AI models for COPD diagnosis are limited to outputting classification results without providing a rationale for their diagnostic process, while current Large Language Models (LLMs) cannot understand spirograms yet, which severely limits their clinical trust and adoption. To tackle this challenge, we leverage a cohort of 234,028 individuals from the UK Biobank (UKB) to propose SpiroLLM, the first multimodal large language model that can understand spirogram. The model extracts morphological features from respiratory curves via a SpiroEncoder and aligns them with PFT numerical values in a unified latent space using a SpiroProjector, ultimately empowering a large language model to generate a comprehensive diagnostic report. Experimental results confirm that SpiroLLM achieved a diagnostic AUROC of 0.8977 (95% CI: 0.88-0.91). In a robustness test with missing core data, it maintained a 100% valid response rate, far surpassing the 13.4% of a text-only model and showcasing the superiority of its multimodal design. This work demonstrates the substantial potential of deeply fusing physiological signals with large language models, establishing a new paradigm for the next generation of interpretable and reliable clinical decision support tools.

TimeSeries2Report prompting enables adaptive large language model management of lithium-ion batteries 2025-12-18
Show

Large language models (LLMs) offer promising capabilities for interpreting multivariate time-series data, yet their application to real-world battery energy storage system (BESS) operation and maintenance remains largely unexplored. Here, we present TimeSeries2Report (TS2R), a prompting framework that converts raw lithium-ion battery operational time-series into structured, semantically enriched reports, enabling LLMs to reason, predict, and make decisions in BESS management scenarios. TS2R encodes short-term temporal dynamics into natural language through a combination of segmentation, semantic abstraction, and rule-based interpretation, effectively bridging low-level sensor signals with high-level contextual insights. We benchmark TS2R across both lab-scale and real-world datasets, evaluating report quality and downstream task performance in anomaly detection, state-of-charge prediction, and charging/discharging management. Compared with vision-, embedding-, and text-based prompting baselines, report-based prompting via TS2R consistently improves LLM performance in terms of across accuracy, robustness, and explainability metrics. Notably, TS2R-integrated LLMs achieve expert-level decision quality and predictive consistency without retraining or architecture modification, establishing a practical path for adaptive, LLM-driven battery intelligence.

Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting 2025-12-18
Show

Time series forecasting always faces the challenge of concept drift, where data distributions evolve over time, leading to a decline in forecast model performance. Existing solutions are based on online learning, which continually organize recent time series observations as new training samples and update model parameters according to the forecasting feedback on recent data. However, they overlook a critical issue: obtaining ground-truth future values of each sample should be delayed until after the forecast horizon. This delay creates a temporal gap between the training samples and the test sample. Our empirical analysis reveals that the gap can introduce concept drift, causing forecast models to adapt to outdated concepts. In this paper, we present Proceed, a novel proactive model adaptation framework for online time series forecasting. Proceed first estimates the concept drift between the recently used training samples and the current test sample. It then employs an adaptation generator to efficiently translate the estimated drift into parameter adjustments, proactively adapting the model to the test sample. To enhance the generalization capability of the framework, Proceed is trained on synthetic diverse concept drifts. Extensive experiments on five real-world datasets across various forecast models demonstrate that Proceed brings more performance improvements than the state-of-the-art online learning methods, significantly facilitating forecast models' resilience against concept drifts. Code is available at https://github.com/SJTU-DMTai/OnlineTSF.

Accep...

Accepted by KDD 2025. This version fixed typos in Eq. (3)

SEED: Spectral Entropy-Guided Evaluation of SpatialTemporal Dependencies for Multivariate Time Series Forecasting 2025-12-18
Show

Effective multivariate time series forecasting often benefits from accurately modeling complex inter-variable dependencies. However, existing attention- or graph-based methods face three key issues: (a) strong temporal self-dependencies are often disrupted by irrelevant variables; (b) softmax normalization ignores and reverses negative correlations; (c) variables struggle to perceive their temporal positions. To address these, we propose \textbf{SEED}, a Spectral Entropy-guided Evaluation framework for spatial-temporal Dependency modeling. SEED introduces a Dependency Evaluator, a key innovation that leverages spectral entropy to dynamically provide a preliminary evaluation of the spatial and temporal dependencies of each variable, enabling the model to adaptively balance Channel Independence (CI) and Channel Dependence (CD) strategies. To account for temporal regularities originating from the influence of other variables rather than intrinsic dynamics, we propose Spectral Entropy-based Fuser to further refine the evaluated dependency weights, effectively separating this part. Moreover, to preserve negative correlations, we introduce a Signed Graph Constructor that enables signed edge weights, overcoming the limitations of softmax. Finally, to help variables perceive their temporal positions and thereby construct more comprehensive spatial features, we introduce the Context Spatial Extractor, which leverages local contextual windows to extract spatial features. Extensive experiments on 12 real-world datasets from various application domains demonstrate that SEED achieves state-of-the-art performance, validating its effectiveness and generality.

Conversational Time Series Foundation Models: Towards Explainable and Effective Forecasting 2025-12-17
Show

The proliferation of time series foundation models has created a landscape where no single method achieves consistent superiority, framing the central challenge not as finding the best model, but as orchestrating an optimal ensemble with interpretability. While Large Language Models (LLMs) offer powerful reasoning capabilities, their direct application to time series forecasting has proven ineffective. We address this gap by repositioning the LLM as an intelligent judge that evaluates, explains, and strategically coordinates an ensemble of foundation models. To overcome the LLM's inherent lack of domain-specific knowledge on time series, we introduce an R1-style finetuning process, guided by SHAP-based faithfulness scores, which teaches the model to interpret ensemble weights as meaningful causal statements about temporal dynamics. The trained agent then engages in iterative, multi-turn conversations to perform forward-looking assessments, provide causally-grounded explanations for its weighting decisions, and adaptively refine the optimization strategy. Validated on the GIFT-Eval benchmark on 23 datasets across 97 settings, our approach significantly outperforms leading time series foundation models on both CRPS and MASE metrics, establishing new state-of-the-art results.

31Pages
Enhancing Line Density Plots with Outlier Control and Bin-based Illumination 2025-12-17
Show

Density plots effectively summarize large numbers of points, which would otherwise lead to severe overplotting in, for example, a scatter plot. However, when applied to line-based datasets, such as trajectories or time series, density plots alone are insufficient, as they disrupt path continuity, obscuring smooth trends and rare anomalies. We propose a bin-based illumination model that decouples structure from density to enhance flow and reveal sparse outliers while preserving the original colormap. We introduce a bin-based outlierness metric to rank trajectories. Guided by this ranking, we construct a structural normal map and apply locally-adaptive lighting in the luminance channel to highlight chosen patterns -- from dominant trends to atypical paths -- with acceptable color distortion. Our interactive method enables analysts to prioritize main trends, focus on outliers, or strike a balance between the two. We demonstrate our method on several real-world datasets, showing it reveals details missed by simpler alternatives, achieves significantly lower CIEDE2000 color distortion than standard shading, and supports interactive updates for up to 10,000 lines.

Accep...

Accepted by PacificVis 2026 (TVCG Journal Track)

Concurrence: A dependence criterion for time series, applied to biological data 2025-12-17
Show

Measuring the statistical dependence between observed signals is a primary tool for scientific discovery. However, biological systems often exhibit complex non-linear interactions that currently cannot be captured without a priori knowledge or large datasets. We introduce a criterion for dependence, whereby two time series are deemed dependent if one can construct a classifier that distinguishes between temporally aligned vs. misaligned segments extracted from them. We show that this criterion, concurrence, is theoretically linked with dependence, and can become a standard approach for scientific analyses across disciplines, as it can expose relationships across a wide spectrum of signals (fMRI, physiological and behavioral data) without ad-hoc parameter tuning or large amounts of data.

Forking-Sequences 2025-12-17
Show

While accuracy is a critical requirement for time series forecasting models, an equally important (yet often overlooked) desideratum is forecast stability across forecast creation dates (FCDs). Even highly accurate models can produce erratic revisions between FCDs, undermining stakeholder trust and disrupting downstream decision-making. To improve forecast stability, models like MQCNN, MQT, and SPADE employ a little-known but highly effective technique: forking-sequences. Unlike standard statistical and neural forecasting methods that treat each FCD independently, the forking-sequences method jointly encodes and decodes the entire time series across all FCDs, in a way mirroring time series cross-validation. Since forking sequences remains largely unknown in the broader neural forecasting community, in this work, we formalize the forking-sequences approach, and we make a case for its broader adoption. We demonstrate three key benefits of forking-sequences: (i) more stable and consistent gradient updates during training; (ii) reduced forecast variance through ensembling; and (iii) improved inference computational efficiency. We validate forking-sequences' benefits using 16 datasets from the M1, M3, M4, and Tourism competitions, showing improvements in forecast percentage change stability of 28.8%, 28.8%, 37.9%, and 31.3%, and 8.8%, on average, for MLP, RNN, LSTM, CNN, and Transformer-based architectures, respectively.

Prese...

Presented at the GPU-Accelerated and Scalable Optimization (ScaleOpt) Workshop, NeurIPS 2025

Scientific Machine Learning of Chaotic Systems Discovers Governing Equations for Neural Populations 2025-12-17
Show

Discovering governing equations that describe complex chaotic systems remains a fundamental challenge in physics and neuroscience. Here, we introduce the PEM-UDE method, which combines the prediction-error method with universal differential equations to extract interpretable mathematical expressions from chaotic dynamical systems, even with limited or noisy observations. This approach succeeds where traditional techniques fail by smoothing optimization landscapes and removing the chaotic properties during the fitting process without distorting optimal parameters. We demonstrate its efficacy by recovering hidden states in the Rossler system and reconstructing dynamics from noise-corrupted electrical-circuit data, in which the correct functional form of the dynamics is recovered even when one of the observed time series is corrupted by noise 5x the magnitude of the true signal. We demonstrate that this method can recover the correct dynamics, whereas direct symbolic regression methods, such as STLSQ, fail to do so with the available data and noise. Importantly, when applied to neural populations, our method derives novel governing equations that respect biological constraints such as network sparsity - a constraint necessary for cortical information processing yet not captured in next-generation neural mass models - while preserving microscale neuronal parameters. These equations predict an emergent relationship between connection density and both oscillation frequency and synchrony in neural circuits. We validate these predictions using three intracranial electrode recording datasets from the medial entorhinal cortex, prefrontal cortex, and orbitofrontal cortex. Our work provides a pathway to develop mechanistic, multi-scale brain models that generalize across diverse neural architectures, bridging the gap between single-neuron dynamics and macroscale brain activity.

46 pages, 9 figures
A Multivariate Statistical Framework for Detection, Classification and Pre-localization of Anomalies in Water Distribution Networks 2025-12-17
Show

This paper presents a unified framework, for the detection, classification, and preliminary localization of anomalies in water distribution networks using multivariate statistical analysis. The approach, termed SICAMS (Statistical Identification and Classification of Anomalies in Mahalanobis Space), processes heterogeneous pressure and flow sensor data through a whitening transformation to eliminate spatial correlations among measurements. Based on the transformed data, the Hotelling's $T^2$ statistic is constructed, enabling the formulation of anomaly detection as a statistical hypothesis test of network conformity to normal operating conditions. It is shown that Hotelling's $T^2$ statistic can serve as an integral indicator of the overall "health" of the system, exhibiting correlation with total leakage volume, and thereby enabling approximate estimation of water losses via a regression model. A heuristic algorithm is developed to analyze the $T^2$ time series and classify detected anomalies into abrupt leaks, incipient leaks, and sensor malfunctions. Furthermore, a coarse leak localization method is proposed, which ranks sensors according to their statistical contribution and employs Laplacian interpolation to approximate the affected region within the network. Application of the proposed framework to the BattLeDIM L-Town benchmark dataset demonstrates high sensitivity and reliability in leak detection, maintaining robust performance even under multiple leaks. These capabilities make the method applicable to real-world operational environments without the need for a calibrated hydraulic model.

48 pa...

48 pages, 18 figures, 3 tables

ST-DETrack: Identity-Preserving Branch Tracking in Entangled Plant Canopies via Dual Spatiotemporal Evidence 2025-12-17
Show

Automated extraction of individual plant branches from time-series imagery is essential for high-throughput phenotyping, yet it remains computationally challenging due to non-rigid growth dynamics and severe identity fragmentation within entangled canopies. To overcome these stage-dependent ambiguities, we propose ST-DETrack, a spatiotemporal-fusion dual-decoder network designed to preserve branch identity from budding to flowering. Our architecture integrates a spatial decoder, which leverages geometric priors such as position and angle for early-stage tracking, with a temporal decoder that exploits motion consistency to resolve late-stage occlusions. Crucially, an adaptive gating mechanism dynamically shifts reliance between these spatial and temporal cues, while a biological constraint based on negative gravitropism mitigates vertical growth ambiguities. Validated on a Brassica napus dataset, ST-DETrack achieves a Branch Matching Accuracy (BMA) of 93.6%, significantly outperforming spatial and temporal baselines by 28.9 and 3.3 percentage points, respectively. These results demonstrate the method's robustness in maintaining long-term identity consistency amidst complex, dynamic plant architectures.

Under...

Under Review at IEEE Transactions on Image Processing

A Regime-Aware Fusion Framework for Time Series Classification 2025-12-17
Show

Kernel-based methods such as Rocket are among the most effective default approaches for univariate time series classification (TSC), yet they do not perform equally well across all datasets. We revisit the long-standing intuition that different representations capture complementary structure and show that selectively fusing them can yield consistent improvements over Rocket on specific, systematically identifiable kinds of datasets. We introduce Fusion-3 (F3), a lightweight framework that adaptively fuses Rocket, Sax, and Sfa representations. To understand when fusion helps, we cluster UCR datasets into six groups using meta-features capturing series length, spectral structure, roughness, and class imbalance, and treat these clusters as interpretable data-structure regimes. Our analysis shows that fusion typically outperforms strong baselines in regimes with structured variability or rich frequency content, while offering diminishing returns in highly irregular or outlier-heavy settings. To support these findings, we combine three complementary analyses: non-parametric paired statistics across datasets, ablation studies isolating the roles of individual representations, and attribution via SHAP to identify which dataset properties predict fusion gains. Sample-level case studies further reveal the underlying mechanism: fusion primarily improves performance by rescuing specific errors, with adaptive increases in frequency-domain weighting precisely where corrections occur. Using 5-fold cross-validation on the 113 UCR datasets, F3 yields small but consistent average improvements over Rocket, supported by frequentist and Bayesian evidence and accompanied by clearly identifiable failure cases. Our results show that selectively applied fusion provides dependable and interpretable extension to strong kernel-based methods, correcting their weaknesses precisely where the data support it.

CANet: ChronoAdaptive Network for Enhanced Long-Term Time Series Forecasting under Non-Stationarity 2025-12-17
Show

Long-term time series forecasting plays a pivotal role in various real-world applications. Despite recent advancements and the success of different architectures, forecasting is often challenging due to non-stationary nature of the real-world data, which frequently exhibit distribution shifts and temporal changes in statistical properties like mean and variance over time. Previous studies suggest that this inherent variability complicates forecasting, limiting the performance of many models by leading to loss of non-stationarity and resulting in over-stationarization (Liu, Wu, Wang and Long, 2022). To address this challenge, we introduce a novel architecture, ChoronoAdaptive Network (CANet), inspired by style-transfer techniques. The core of CANet is the Non-stationary Adaptive Normalization module, seamlessly integrating the Style Blending Gate and Adaptive Instance Normalization (AdaIN) (Huang and Belongie, 2017). The Style Blending Gate preserves and reintegrates non-stationary characteristics, such as mean and standard deviation, by blending internal and external statistics, preventing over-stationarization while maintaining essential temporal dependencies. Coupled with AdaIN, which dynamically adapts the model to statistical changes, this approach enhances predictive accuracy under non-stationary conditions. CANet also employs multi-resolution patching to handle short-term fluctuations and long-term trends, along with Fourier analysis-based adaptive thresholding to reduce noise. A Stacked Kronecker Product Layer further optimizes the model's efficiency while maintaining high performance. Extensive experiments on real-world datasets validate CANet's superiority over state-of-the-art methods, achieving a 42% reduction in MSE and a 22% reduction in MAE. The source code is publicly available at https://github.com/mertsonmezer/CANet.

On non-stationarity of the Poisson gamma state space models 2025-12-17
Show

The Poisson-gamma state space (PGSS) models have been utilized in the analysis of non-negative integer-valued time series to sequentially obtain closed form filtering and predictive densities. In this study, we show the underlying mechanics and non-stationary properties of multi-step ahead predictive distributions for the PGSS family of models. By exploiting the non-stationary structure of the PGSS model, we establish that the predictive mean remains constant while the predictive variance diverges with the forecast horizon, a property also found in Gaussian random walk models. We show that, in the long run, the predictive distribution converges to a zero-degenerated distribution, such that both point and interval forecasts eventually converge towards zero. In doing so, we comment on the effect of hyper-parameters and the discount factor on the long-run behavior of the forecasts.

10 pages, 1 figure
FADTI: Fourier and Attention Driven Diffusion for Multivariate Time Series Imputation 2025-12-17
Show

Multivariate time series imputation is fundamental in applications such as healthcare, traffic forecasting, and biological modeling, where sensor failures and irregular sampling lead to pervasive missing values. However, existing Transformer- and diffusion-based models lack explicit inductive biases and frequency awareness, limiting their generalization under structured missing patterns and distribution shifts. We propose FADTI, a diffusion-based framework that injects frequency-informed feature modulation via a learnable Fourier Bias Projection (FBP) module and combines it with temporal modeling through self-attention and gated convolution. FBP supports multiple spectral bases, enabling adaptive encoding of both stationary and non-stationary patterns. This design injects frequency-domain inductive bias into the generative imputation process. Experiments on multiple benchmarks, including a newly introduced biological time series dataset, show that FADTI consistently outperforms state-of-the-art methods, particularly under high missing rates. Code is available at https://anonymous.4open.science/r/TimeSeriesImputation-52BF

This ...

This work has been submitted to the IEEE for possible publication. 15 pages, 8 figures

WaveGNN: Integrating Graph Neural Networks and Transformers for Decay-Aware Classification of Irregular Clinical Time-Series 2025-12-17
Show

Clinical time series are often irregularly sampled, with varying sensor frequencies, missing observations, and misaligned timestamps. Prior approaches typically address these irregularities by interpolating data into regular sequences, thereby introducing bias, or by generating inconsistent and uninterpretable relationships across sensor measurements, complicating the accurate learning of both intra-series and inter-series dependencies. We introduce WaveGNN, a model that operates directly on irregular multivariate time series without interpolation or conversion to a regular representation. WaveGNN combines a decay-aware Transformer to capture intra-series dynamics with a sample-specific graph neural network that models both short-term and long-term inter-sensor relationships. Therefore, it generates a single, sparse, and interpretable graph per sample. Across multiple benchmark datasets (P12, P19, MIMIC-III, and PAM), WaveGNN delivers consistently strong performance, whereas other state-of-the-art baselines tend to perform well on some datasets or tasks but poorly on others. While WaveGNN does not necessarily surpass every method in every case, its consistency and robustness across diverse settings set it apart. Moreover, the learned graphs align well with known physiological structures, enhancing interpretability and supporting clinical decision-making.

Accep...

Accepted to IEEE BigData 2025. Author-accepted manuscript

Functional Periodic ARMA Processes 2025-12-16
Show

Periodicity is a common feature of time series. For finite-dimensional data, periodic autoregressive moving average (ARMA) models have been extensively studied. In functional time series analysis, AR models have been extended to incorporate periodicity, but existing approaches remain incomplete and do not cover the ARMA setting. This paper develops a rigorous theoretical framework for functional periodic ARMA (fPARMA) processes in general separable Hilbert spaces. The proposed model class accommodates periodically varying dependence structures. We derive sufficient conditions for periodic stationarity, the existence of finite moments, and weak dependence. Moreover, we study Yule-Walker-type estimators for the fPAR operators and, in a specific setting, estimators for the fPARMA operators, and establish convergence rates under Sobolev-type regularity assumptions.

27 pages
A Path Signature Framework for Detecting Creative Fatigue in Digital Advertising 2025-12-16
Show

This paper introduces a signature-based framework for detecting advertising creative fatigue using path signatures, a geometric representation from rough path theory. Creative fatigue -- the degradation of creative effectiveness under repeated exposure -- is operationally important in digital marketing because delayed detection can translate directly into avoidable opportunity cost. We reframe fatigue monitoring as a geometric change detection problem: advertising performance trajectories are embedded as paths and represented by truncated (log-)signatures, enabling detection of changes in trend, volatility, and non-linear dynamics beyond simple mean or variance shifts. We further connect statistical detection to managerial decision-making via an explicit quantification of performance loss relative to a benchmark period. Because proprietary production data cannot be released, we evaluate the proposed framework on a synthetic panel dataset designed to mimic realistic impression volumes and noisy day-to-day CTR dynamics. We define observed CTR as the realised binomial rate $CTR_t := C_t/I_t$ using daily clicks $C_t$ and impressions $I_t$. The accompanying CSV also contains a pre-computed CTR field (e.g., due to rounding or upstream derivation), but all modelling and evaluation in this paper use $C_t/I_t$. Crucially, the dataset does not include injected changepoints; we therefore define an operational ground truth for ``fatigue onset'' based on a noise-robust CTR estimate and a sustained deterioration relative to a recent-best baseline. We report lead-time (early warning) and alert-burden metrics under this operational definition, and provide a sensitivity analysis over the detector's primary tuning parameters. The methodology scales linearly in time-series length for fixed signature depth and is suitable for monitoring large creative portfolios.

version 3
Unitless Unrestricted Markov-Consistent SCM Generation: Better Benchmark Datasets for Causal Discovery 2025-12-16
Show

Causal discovery aims to extract qualitative causal knowledge in the form of causal graphs from data. Because causal ground truth is rarely known in the real world, simulated data plays a vital role in evaluating the performance of the various causal discovery algorithms proposed in the literature. But recent work highlighted certain artifacts of commonly used data generation techniques for a standard class of structural causal models (SCM) that may be nonphysical, including var- and R2-sortability, where the variables' variance and coefficients of determination (R2) after regressing on all other variables, respectively, increase along the causal order. Some causal methods exploit such artifacts, leading to unrealistic expectations for their performance on real-world data. Some modifications have been proposed to remove these artifacts; notably, the internally-standardized structural causal model (iSCM) avoids varsortability and largely alleviates R2-sortability on sparse causal graphs, but exhibits a reversed R2-sortability pattern for denser graphs not featured in their work. We analyze which sortability patterns we expect to see in real data, and propose a method for drawing coefficients that we argue more effectively samples the space of SCMs. Finally, we propose a novel extension of our SCM generation method to the time series setting.

4th C...

4th Conference on Causal Learning and Reasoning. Code published in python package "UUMCdata" (https://pypi.org/project/UUMCdata/)

Asymptotic Inference for Rank Correlations 2025-12-16
Show

Kendall's tau and Spearman's rho are widely used tools for measuring dependence. Surprisingly, when it comes to asymptotic inference for these rank correlations, some fundamental results and methods have not yet been developed, in particular for discrete random variables and in the time series case, and concerning variance estimation in general. Consequently, asymptotic confidence intervals are not available. We provide a comprehensive treatment of asymptotic inference for classical rank correlations, including Kendall's tau, Spearman's rho, Goodman-Kruskal's gamma, Kendall's tau-b, and grade correlation. We derive asymptotic distributions for both iid and time series data, resorting to asymptotic results for U-statistics, and introduce consistent variance estimators. This enables the construction of confidence intervals and tests, generalizes classical results for continuous random variables and leads to corrected versions of widely used tests of independence. We analyze the finite-sample performance of our variance estimators, confidence intervals, and tests in simulations and illustrate their use in case studies.

Semiparametric inference for impulse response functions using double/debiased machine learning 2025-12-16
Show

We introduce a double/debiased machine learning estimator for the impulse response function in settings where a time series of interest is subjected to multiple discrete treatments, assigned over time, which can have a causal effect on future outcomes. The proposed estimator can rely on fully nonparametric relations between treatment and outcome variables, opening up the possibility to use flexible machine learning approaches to estimate impulse response functions. To this end, we extend the theory of double machine learning from an i.i.d. to a time series setting and show that the proposed estimator is consistent and asymptotically normally distributed at the parametric rate, allowing for semiparametric inference for dynamic effects in a time series setting. The properties of the estimator are validated numerically in finite samples by applying it to learn the impulse response function in the presence of serial dependence in both the confounder and observation innovation processes. We also illustrate the methodology empirically by applying it to the estimation of the effects of macroeconomic shocks.

Counterfactual Explanations for Time Series Should be Human-Centered and Temporally Coherent in Interventions 2025-12-16
Show

Counterfactual explanations are increasingly proposed as interpretable mechanisms to achieve algorithmic recourse. However, current counterfactual techniques for time series classification are predominantly designed with static data assumptions and focus on generating minimal input perturbations to flip model predictions. This paper argues that such approaches are fundamentally insufficient in clinical recommendation settings, where interventions unfold over time and must be causally plausible and temporally coherent. We advocate for a shift towards counterfactuals that reflect sustained, goal-directed interventions aligned with clinical reasoning and patient-specific dynamics. We identify critical gaps in existing methods that limit their practical applicability, specifically, temporal blind spots and the lack of user-centered considerations in both method design and evaluation metrics. To support our position, we conduct a robustness analysis of several state-of-the-art methods for time series and show that the generated counterfactuals are highly sensitive to stochastic noise. This finding highlights their limited reliability in real-world clinical settings, where minor measurement variations are inevitable. We conclude by calling for methods and evaluation frameworks that go beyond mere prediction changes without considering feasibility or actionability. We emphasize the need for actionable, purpose-driven interventions that are feasible in real-world contexts for the users of such applications.

Two Bayesian Approaches to Dynamic Gaussian Bayesian Networks with Intra- and Inter-Slice Edges 2025-12-16
Show

Gaussian Dynamic Bayesian Networks (GDBNs) are a widely used tool for learning network structures from continuous time-series data. To capture both time-lagged and contemporaneous dependencies, advanced GDBNs allow for dynamic inter-slice edges as well as static intra-slice edges. In the literature, two Bayesian modeling approaches have been developed for GDBNs. Both build on and extend the well-known Gaussian BGe score. We refer to them as the mean-adjusted BGe (mBGe) and the extended BGe (eBGe) models. In this paper, we contrast the two models and compare their performance empirically. The main finding of our study is that the two models induce different equivalence classes of network structures. In particular, the equivalence classes implied by the eBGe model are non-standard, and we propose a new variant of the DAG-to-CPDAG algorithm to identify them. To the best of our knowledge, these non-standard equivalence classes have not been previously reported.

24 pages, 12 figures
TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting 2025-12-16
Show

Foundation models for zero-shot time series forecasting face challenges in efficient long-horizon prediction and reproducibility, with existing synthetic-only approaches underperforming on challenging benchmarks. This paper presents TempoPFN, a univariate time series foundation model based on linear Recurrent Neural Networks (RNNs) pre-trained exclusively on synthetic data. The model uses a GatedDeltaProduct architecture with state-weaving for fully parallelizable training across sequence lengths, eliminating the need for windowing or summarization techniques while maintaining robust temporal state-tracking. Our comprehensive synthetic data pipeline unifies diverse generators, including stochastic differential equations, Gaussian processes, and audio synthesis, with novel augmentations. In zero-shot evaluations on the Gift-Eval, fev-bench and Chronos-ZS benchmarks, TempoPFN achieves top-tier competitive performance, outperforming all existing synthetic-only approaches and surpassing the majority of models trained on real-world data, while being more efficient than existing baselines by leveraging fully parallelizable training and inference. We open-source our complete data generation pipeline and training code, providing a reproducible foundation for future research.

38 pa...

38 pages, 22 figures, 17 tables

MSTN: Fast and Efficient Multivariate Time Series Prediction Model 2025-12-16
Show

Real-world time series often exhibit strong non-stationarity, complex nonlinear dynamics, and behaviour expressed across multiple temporal scales, from rapid local fluctuations to slow-evolving long-range trends. However, many contemporary architectures impose rigid, fixed-scale structural priors -- such as patch-based tokenization, predefined receptive fields, or frozen backbone encoders -- which can over-regularize temporal dynamics and limit adaptability to abrupt high-magnitude events. To handle this, we introduce the \emph{Multi-scale Temporal Network} (MSTN), a hybrid neural architecture grounded in an \emph{Early Temporal Aggregation} principle. MSTN integrates three complementary components: (i) a multi-scale convolutional encoder that captures fine-grained local structure; (ii) a sequence modeling module that learns long-range dependencies through either recurrent or attention-based mechanisms; and (iii) a self-gated fusion stage incorporating squeeze-excitation and multi-head attention to dynamically modulate cross-scale representations. This design enables MSTN to flexibly model temporal patterns spanning milliseconds to extended horizons, while avoiding the computational burden typically associated with long-context models. Across extensive benchmarks covering forecasting, imputation, classification, and cross-dataset generalization, MSTN consistently delivers state-of-the-art performance, outperforming recent leading approaches including TIME-LLM, HiMTM, SOFTS, LLM4TS, TimesNet, and PatchTST, and establishing new best results on 24 out of 32 datasets. Despite its strong performance, MSTN remains lightweight and supports fast inference, making it well suited for deployment on edge devices and resource-constrained environments.

30 pa...

30 pages, 1 figure, 5 tables

An Unsupervised Deep Explainable AI Framework for Localization of Concurrent Replay Attacks in Nuclear Reactor Signals 2025-12-16
Show

Next generation advanced nuclear reactors are expected to be smaller both in size and power output, relying extensively on fully digital instrumentation and control systems. These reactors will generate a large flow of information in the form of multivariate time series data, conveying simultaneously various non linear cyber physical, process, control, sensor, and operational states. Ensuring data integrity against deception attacks is becoming increasingly important for networked communication and a requirement for safe and reliable operation. Current efforts to address replay attacks, almost universally focus on watermarking or supervised anomaly detection approaches without further identifying and characterizing the root cause of the anomaly. In addition, these approaches rely mostly on synthetic data with uncorrelated Gaussian process and measurement noise and full state feedback or are limited to univariate signals, signal stationarity, linear quadratic regulators, or other linear-time invariant state-space which may fail to capture any unmodeled system dynamics. In the realm of regulated nuclear cyber-physical systems, additional work is needed on characterization of replay attacks and explainability of predictions using real data. Here, we propose an unsupervised explainable AI framework based on a combination of autoencoder and customized windowSHAP algorithm to fully characterize real-time replay attacks, i.e., detection, source identification, timing and type, of increasing complexity during a dynamic time evolving reactor process. The proposed XAI framework was benchmarked on several real world datasets from Purdue's nuclear reactor PUR-1 with up to six signals concurrently being replayed. In all cases, the XAI framework was able to detect and identify the source and number of signals being replayed and the duration of the falsification with 95 percent or better accuracy.

Added...

Added references, corrected typos, grammar check, authors updated

Not All Data are Good Labels: On the Self-supervised Labeling for Time Series Forecasting 2025-12-16
Show

Time Series Forecasting (TSF) is a crucial task in various domains, yet existing TSF models rely heavily on high-quality data and insufficiently exploit all available data. This paper explores a novel self-supervised approach to re-label time series datasets by inherently constructing candidate datasets. During the optimization of a simple reconstruction network, intermediates are used as pseudo labels in a self-supervised paradigm, improving generalization for any predictor. We introduce the Self-Correction with Adaptive Mask (SCAM), which discards overfitted components and selectively replaces them with pseudo labels generated from reconstructions. Additionally, we incorporate Spectral Norm Regularization (SNR) to further suppress overfitting from a loss landscape perspective. Our experiments on eleven real-world datasets demonstrate that SCAM consistently improves the performance of various backbone models. This work offers a new perspective on constructing datasets and enhancing the generalization of TSF models through self-supervised learning. The code is available at https://github.com/SuDIS-ZJU/SCAM.

Accep...

Accepted by NeurIPS'25 (spotlight)

Signature-Informed Selection Detection: A Novel Method for Multi-Locus Temporal Population Genetic Model with Recombination 2025-12-16
Show

In population genetics, there is often interest in inferring selection coefficients. This task becomes more challenging if multiple linked selected loci are considered simultaneously. For such a situation, we propose a novel generalized Bayesian framework where we compute a scoring rule posterior for the selection coefficients in multi-locus temporal population genetics models. As we consider trajectories of allele frequencies over time as our data, we choose to use a signature kernel scoring rule - a kernel scoring rule defined for high-dimensional time-series data using iterated path integrals of a path (called signatures). We can compute an unbiased estimate of the signature kernel score using model simulations. This enables us to sample asymptotically from the signature kernel scoring rule posterior of the selection coefficients using pseudo-marginal MCMC-type algorithms. Through a simulation study, we were able to show the inferential efficacy of our method compared to existing benchmark methods for two and three selected locus scenarios under the standard Wright-Fisher model with recombination and selection. We also consider a negative frequency-dependent selection model for one and two locus scenarios, and also joint inference of selection coefficients and initial haplotype frequencies under the standard Wright-Fisher model. Finally, we illustrate the application of our inferential method for two real-life dataset. More specifically, we consider a data set on Yeast, as well as data from an Evolve and Resequence (E&R) experiment on {\em Drosophila simulans}.

FLAME: Flow Enhanced Legendre Memory Models for General Time Series Forecasting 2025-12-16
Show

In this work, we introduce FLAME, a family of extremely lightweight and capable Time Series Foundation Models, which support both deterministic and probabilistic forecasting via generative probabilistic modeling, thus ensuring both efficiency and robustness. FLAME utilizes the Legendre Memory for strong generalization capabilities. Through adapting variants of Legendre Memory, i.e., translated Legendre (LegT) and scaled Legendre (LegS), in the Encoding and Decoding phases, FLAME can effectively capture the inherent inductive bias within data and make efficient long-range inferences. To enhance the accuracy of probabilistic forecasting while keeping efficient, FLAME adopts a Normalization Flow based forecasting head, which can model the arbitrarily intricate distributions over the forecasting horizon in a generative manner. Comprehensive experiments on well-recognized benchmarks, including TSFM-Bench and ProbTS, demonstrate the consistent state-of-the-art zero-shot performance of FLAME on both deterministic and probabilistic forecasting tasks.

IdealTSF: Can Non-Ideal Data Contribute to Enhancing the Performance of Time Series Forecasting Models? 2025-12-16
Show

Deep learning has shown strong performance in time series forecasting tasks. However, issues such as missing values and anomalies in sequential data hinder its further development in prediction tasks. Previous research has primarily focused on extracting feature information from sequence data or addressing these suboptimal data as positive samples for knowledge transfer. A more effective approach would be to leverage these non-ideal negative samples to enhance event prediction. In response, this study highlights the advantages of non-ideal negative samples and proposes the IdealTSF framework, which integrates both ideal positive and negative samples for time series forecasting. IdealTSF consists of three progressive steps: pretraining, training, and optimization. It first pretrains the model by extracting knowledge from negative sample data, then transforms the sequence data into ideal positive samples during training. Additionally, a negative optimization mechanism with adversarial disturbances is applied. Extensive experiments demonstrate that negative sample data unlocks significant potential within the basic attention architecture for time series forecasting. Therefore, IdealTSF is particularly well-suited for applications with noisy samples or low-quality data.

Accep...

Accepted at AAAI 2026

Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models 2025-12-16
Show

Modern state-space models (SSMs) often utilize transition matrices which enable efficient computation but pose restrictions on the model's expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs. Our method, PD-SSM, parametrizes the transition matrix as the product of a column one-hot matrix ($P$) and a complex-valued diagonal matrix ($D$). Consequently, the computational cost of parallel scans scales linearly with the state size. Theoretically, the model is BIBO-stable and can emulate any $N$-state FSA with one layer of dimension $N$ and a linear readout of size $N \times N$, significantly improving on all current structured SSM guarantees. Experimentally, the model significantly outperforms a wide collection of modern SSM variants on various FSA state tracking tasks. On multiclass time-series classification, the performance is comparable to that of neural controlled differential equations, a paradigm explicitly built for time-series analysis. Finally, we integrate PD-SSM into a hybrid Transformer-SSM architecture and demonstrate that the model can effectively track the states of a complex FSA in which transitions are encoded as a set of variable-length English sentences. The code is available at https://github.com/IBM/expressive-sparse-state-space-model

10 pa...

10 pages, NeurIPS 2025 Spotlight

MIRA: Medical Time Series Foundation Model for Real-World Health Data 2025-12-16
Show

A unified foundation model for medical time series -- pretrained on open access and ethics board-approved medical corpora -- offers the potential to reduce annotation burdens, minimize model customization, and enable robust transfer across clinical institutions, modalities, and tasks, particularly in data-scarce or privacy-constrained environments. However, existing generalist time series foundation models struggle to handle medical time series data due to their inherent challenges, including irregular intervals, heterogeneous sampling rates, and frequent missing values. To address these challenges, we introduce MIRA, a unified foundation model specifically designed for medical time series forecasting. MIRA incorporates a Continuous-Time Rotary Positional Encoding that enables fine-grained modeling of variable time intervals, a frequency-specific mixture-of-experts layer that routes computation across latent frequency regimes to further promote temporal specialization, and a Continuous Dynamics Extrapolation Block based on Neural ODE that models the continuous trajectory of latent states, enabling accurate forecasting at arbitrary target timestamps. Pretrained on a large-scale and diverse medical corpus comprising over 454 billion time points collect from publicly available datasets, MIRA achieves reductions in forecasting errors by an average of 10% and 7% in out-of-distribution and in-distribution scenarios, respectively, when compared to other zero-shot and fine-tuned baselines. We also introduce a comprehensive benchmark spanning multiple downstream clinical tasks, establishing a foundation for future research in medical time series modeling.

NeurI...

NeurIPS 2025 Main Conference

Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective 2025-12-16
Show

Time Series Forecasting has made significant progress with the help of Patching technique, which partitions time series into multiple patches to effectively retain contextual semantic information into a representation space beneficial for modeling long-term dependencies. However, conventional patching partitions a time series into adjacent patches, which causes a fixed representation space, thus resulting in insufficiently expressful representations. In this paper, we pioneer the exploration of constructing a selective representation space to flexibly include the most informative patches for forecasting. Specifically, we propose the Selective Representation Space (SRS) module, which utilizes the learnable Selective Patching and Dynamic Reassembly techniques to adaptively select and shuffle the patches from the contextual time series, aiming at fully exploiting the information of contextual time series to enhance the forecasting performance of patch-based models. To demonstrate the effectiveness of SRS module, we propose a simple yet effective SRSNet consisting of SRS and an MLP head, which achieves state-of-the-art performance on real-world datasets from multiple domains. Furthermore, as a novel plug-and-play module, SRS can also enhance the performance of existing patch-based models. The resources are available at https://github.com/decisionintelligence/SRSNet.

Optimizing Multi-Tier Supply Chain Ordering with a Hybrid Liquid Neural Network and Extreme Gradient Boosting Model 2025-12-16
Show

Supply chain management (SCM) faces significant challenges like demand fluctuations and the bullwhip effect. Traditional methods and even state-of-the-art LLMs struggle with benchmarks like the Vending Machine Test, failing to handle SCM's complex continuous time-series data. While ML approaches like LSTM and XGBoost offer solutions, they are often limited by computational inefficiency. Liquid Neural Networks (LNN), known for their adaptability and efficiency in robotics, remain untapped in SCM. This study proposes a hybrid LNN+XGBoost model for multi-tier supply chains. By combining LNN's dynamic feature extraction with XGBoost's global optimization, the model aims to minimize the bullwhip effect and increase profitability. This innovative approach addresses the need for efficiency and adaptability, filling a critical gap in intelligent SCM.

FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis 2025-12-16
Show

Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology, underpinning key tasks including classification, forecasting, and anomaly detection. Although deep learning models have achieved remarkable progress in these areas in recent years, constructing an efficient, multi-task compatible, and generalizable unified framework for time series analysis remains a significant challenge. Existing approaches are often tailored to single tasks or specific data types, making it difficult to simultaneously handle multi-task modeling and effectively integrate information across diverse time series types. Moreover, real-world data are often affected by noise, complex frequency components, and multi-scale dynamic patterns, which further complicate robust feature extraction and analysis. To ameliorate these challenges, we propose FusAD, a unified analysis framework designed for diverse time series tasks. FusAD features an adaptive time-frequency fusion mechanism, integrating both Fourier and Wavelet transforms to efficiently capture global-local and multi-scale dynamic features. With an adaptive denoising mechanism, FusAD automatically senses and filters various types of noise, highlighting crucial sequence variations and enabling robust feature extraction in complex environments. In addition, the framework integrates a general information fusion and decoding structure, combined with masked pre-training, to promote efficient learning and transfer of multi-granularity representations. Extensive experiments demonstrate that FusAD consistently outperforms state-of-the-art models on mainstream time series benchmarks for classification, forecasting, and anomaly detection tasks, while maintaining high efficiency and scalability. Code is available at https://github.com/zhangda1018/FusAD.

Paper...

Paper has been accepted by ICDE2026

Transparent Networks for Multivariate Time Series 2025-12-16
Show

Transparent models, which provide inherently interpretable predictions, are receiving significant attention in high-stakes domains. However, despite much real-world data being collected as time series, there is a lack of studies on transparent time series models. To address this gap, we propose a novel transparent neural network model for time series called Generalized Additive Time Series Model (GATSM). GATSM consists of two parts: 1) independent feature networks to learn feature representations, and 2) a transparent temporal module to learn temporal patterns across different time steps using the feature representations. This structure allows GATSM to effectively capture temporal patterns and handle varying-length time series while preserving transparency. Empirical experiments show that GATSM significantly outperforms existing generalized additive models and achieves comparable performance to black-box time series models, such as recurrent neural networks and Transformer. In addition, we demonstrate that GATSM finds interesting patterns in time series.

AAAI-...

AAAI-26 Special Track on AI Alignment

Multivariate Time Series Forecasting with Hybrid Euclidean-SPD Manifold Graph Neural Networks 2025-12-16
Show

Multivariate Time Series (MTS) forecasting plays a vital role in various real-world applications, such as traffic management and predictive maintenance. Existing approaches typically model MTS data in either Euclidean or Riemannian space, limiting their ability to capture the diverse geometric structures and complex spatio-temporal dependencies inherent in real-world data. To overcome this limitation, we propose the Hybrid Symmetric Positive-Definite Manifold Graph Neural Network (HSMGNN), a novel graph neural network-based model that captures data geometry within a hybrid Euclidean-Riemannian framework. To the best of our knowledge, this is the first work to leverage hybrid geometric representations for MTS forecasting, enabling expressive and comprehensive modeling of geometric properties. Specifically, we introduce a Submanifold-Cross-Segment (SCS) embedding to project input MTS into both Euclidean and Riemannian spaces, thereby capturing spatio-temporal variations across distinct geometric domains. To alleviate the high computational cost of Riemannian distance, we further design an Adaptive-Distance-Bank (ADB) layer with a trainable memory mechanism. Finally, a Fusion Graph Convolutional Network (FGCN) is devised to integrate features from the dual spaces via a learnable fusion operator for accurate prediction. Experiments on three benchmark datasets demonstrate that HSMGNN achieves up to a 13.8 percent improvement over state-of-the-art baselines in forecasting accuracy.

Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting 2025-12-16
Show

Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box base learners fall under the umbrella of "stacked generalization," namely training an ML algorithm that takes the inferences from the base learners as input. While stacking has been widely applied in practice, its theoretical properties are poorly understood. In this paper, we prove a novel result, showing that choosing the best stacked generalization from a (finite or finite-dimensional) family of stacked generalizations based on cross-validated performance does not perform "much worse" than the oracle best. Our result strengthens and significantly extends the results in Van der Laan et al. (2007). Inspired by the theoretical analysis, we further propose a particular family of stacked generalizations in the context of probabilistic forecasting, each one with a different sensitivity for how much the ensemble weights are allowed to vary across items, timestamps in the forecast horizon, and quantiles. Experimental results demonstrate the performance gain of the proposed method.

Publi...

Published at ICML 2023. In this version, we clarify that the proof in fact yields a strictly stronger bound than the one stated in the ICML version

Pattern-Guided Diffusion Models 2025-12-15
Show

Diffusion models have shown promise in forecasting future data from multivariate time series. However, few existing methods account for recurring structures, or patterns, that appear within the data. We present Pattern-Guided Diffusion Models (PGDM), which leverage inherent patterns within temporal data for forecasting future time steps. PGDM first extracts patterns using archetypal analysis and estimates the most likely next pattern in the sequence. By guiding predictions with this pattern estimate, PGDM makes more realistic predictions that fit within the set of known patterns. We additionally introduce a novel uncertainty quantification technique based on archetypal analysis, and we dynamically scale the guidance level based on the pattern estimate uncertainty. We apply our method to two well-motivated forecasting applications, predicting visual field measurements and motion capture frames. On both, we show that pattern guidance improves PGDM's performance (MAE / CRPS) by up to 40.67% / 56.26% and 14.12% / 14.10%, respectively. PGDM also outperforms baselines by up to 65.58% / 84.83% and 93.64% / 92.55%.

Under review
Global Deep Forecasting with Patient-Specific Pharmacokinetics 2025-12-15
Show

Forecasting healthcare time series data is vital for early detection of adverse outcomes and patient monitoring. However, it can be challenging in practice due to variable medication administration and unique pharmacokinetic (PK) properties of each patient. To address these challenges, we propose a novel hybrid global-local architecture and a PK encoder that informs deep learning models of patient-specific treatment effects. We showcase the efficacy of our approach in achieving significant accuracy gains in a blood glucose forecasting task using both realistically simulated and real-world data. Our PK encoder surpasses baselines by up to 16.4% on simulated data and 4.9% on real-world data for individual patients during critical events of severely high and low glucose levels. Furthermore, our proposed hybrid global-local architecture outperforms patient-specific PK models by 15.8%, on average.

From Many Models, One: Macroeconomic Forecasting with Reservoir Ensembles 2025-12-15
Show

Model combination is a powerful approach to achieve superior performance with a set of models than by just selecting any single one. We study both theoretically and empirically the effectiveness of ensembles of Multi-Frequency Echo State Networks (MFESNs), which have been shown to achieve state-of-the-art macroeconomic time series forecasting results (Ballarin et al., 2024a). Hedge and Follow-the-Leader schemes are discussed, and their online learning guarantees are extended to the case of dependent data. In applications, our proposed Ensemble Echo State Networks show significantly improved predictive performance compared to individual MFESN models.

Autotune: fast, accurate, and automatic tuning parameter selection for Lasso 2025-12-15
Show

Least absolute shrinkage and selection operator (Lasso), a popular method for high-dimensional regression, is now used widely for estimating high-dimensional time series models such as the vector autoregression (VAR). Selecting its tuning parameter efficiently and accurately remains a challenge, despite the abundance of available methods for doing so. We propose $\mathsf{autotune}$, a strategy for Lasso to automatically tune itself by optimizing a penalized Gaussian log-likelihood alternately over regression coefficients and noise standard deviation. Using extensive simulation experiments on regression and VAR models, we show that $\mathsf{autotune}$ is faster, and provides better generalization and model selection than established alternatives in low signal-to-noise regimes. In the process, $\mathsf{autotune}$ provides a new estimator of noise standard deviation that can be used for high-dimensional inference, and a new visual diagnostic procedure for checking the sparsity assumption on regression coefficients. Finally, we demonstrate the utility of $\mathsf{autotune}$ on a real-world financial data set. An R package based on C++ is also made publicly available on Github.

53 pages, 35 figures
How PARTs assemble into wholes: Learning the relative composition of images 2025-12-15
Show

The composition of objects and their parts, along with object-object positional relationships, provides a rich source of information for representation learning. Hence, spatial-aware pretext tasks have been actively explored in self-supervised learning. Existing works commonly start from a grid structure, where the goal of the pretext task involves predicting the absolute position index of patches within a fixed grid. However, grid-based approaches fall short of capturing the fluid and continuous nature of real-world object compositions. We introduce PART, a self-supervised learning approach that leverages continuous relative transformations between off-grid patches to overcome these limitations. By modeling how parts relate to each other in a continuous space, PART learns the relative composition of images-an off-grid structural relative positioning that is less tied to absolute appearance and can remain coherent under variations such as partial visibility or stylistic changes. In tasks requiring precise spatial understanding such as object detection and time series prediction, PART outperforms grid-based methods like MAE and DropPos, while maintaining competitive performance on global classification tasks. By breaking free from grid constraints, PART opens up a new trajectory for universal self-supervised pretraining across diverse datatypes-from images to EEG signals-with potential in medical imaging, video, and audio.

Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems 2025-12-15
Show

Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.

Accep...

Accepted at the 30th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA)

HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting 2025-12-15
Show

Transformer-based methods have achieved impressive results in time series forecasting. However, existing Transformers still exhibit limitations in sequence modeling as they tend to overemphasize temporal dependencies. This incurs additional computational overhead without yielding corresponding performance gains. We find that the performance of Transformers is highly dependent on the embedding method used to learn effective representations. To address this issue, we extract multivariate features to augment the effective information captured in the embedding layer, yielding multidimensional embeddings that convey richer and more meaningful sequence representations. These representations enable Transformer-based forecasters to better understand the series. Specifically, we introduce Hybrid Temporal and Multivariate Embeddings (HTME). The HTME extractor integrates a lightweight temporal feature extraction module with a carefully designed multivariate feature extraction module to provide complementary features, thereby achieving a balance between model complexity and performance. By combining HTME with the Transformer architecture, we present HTMformer, leveraging the enhanced feature extraction capability of the HTME extractor to build a lightweight forecaster. Experiments conducted on eight real-world datasets demonstrate that our approach outperforms existing baselines in both accuracy and efficiency.

Channel Dependence, Limited Lookback Windows, and the Simplicity of Datasets: How Biased is Time Series Forecasting? 2025-12-15
Show

In Time Series Forecasting (TSF), the lookback window (the length of historical data used for prediction) is a critical hyperparameter that is often set arbitrarily, undermining the validity of model evaluations. We argue that the lookback window must be tuned on a per-task basis to ensure fair comparisons. Our empirical results show that failing to do so can invert performance rankings, particularly when comparing univariate and multivariate methods. Experiments on standard benchmarks reposition Channel-Independent (CI) models, such as PatchTST, as state-of-the-art methods. However, we reveal this superior performance is largely an artifact of weak inter-channel correlations and simplicity of patterns within these specific datasets. Using Granger causality analysis and ODE datasets (with implicit channel correlations), we demonstrate that the true strength of multivariate Channel-Dependent (CD) models emerges on datasets with strong, inherent cross-channel dependencies, where they significantly outperform CI models. We conclude with three key recommendations for improving TSF research: (i) treat the lookback window as a critical hyperparameter to be tuned, (ii) use statistical analysis of datasets to inform the choice between CI and CD architectures, and (iii) favor CD models in applications with limited data.

Adaptive Information Routing for Multimodal Time Series Forecasting 2025-12-15
Show

Time series forecasting is a critical task for artificial intelligence with numerous real-world applications. Traditional approaches primarily rely on historical time series data to predict the future values. However, in practical scenarios, this is often insufficient for accurate predictions due to the limited information available. To address this challenge, multimodal time series forecasting methods which incorporate additional data modalities, mainly text data, alongside time series data have been explored. In this work, we introduce the Adaptive Information Routing (AIR) framework, a novel approach for multimodal time series forecasting. Unlike existing methods that treat text data on par with time series data as interchangeable auxiliary features for forecasting, AIR leverages text information to dynamically guide the time series model by controlling how and to what extent multivariate time series information should be combined. We also present a text-refinement pipeline that employs a large language model to convert raw text data into a form suitable for multimodal forecasting, and we introduce a benchmark that facilitates multimodal forecasting experiments based on this pipeline. Experiment results with the real world market data such as crude oil price and exchange rates demonstrate that AIR effectively modulates the behavior of the time series model using textual inputs, significantly enhancing forecasting accuracy in various time series forecasting tasks.

Robust tests for parameter change in conditionally heteroscedastic time series models 2025-12-15
Show

Structural changes and outliers often coexist, complicating statistical inference. This paper addresses the problem of testing for parameter changes in conditionally heteroscedastic time series models, particularly in the presence of outliers. To mitigate the impact of outliers, we introduce a two-step procedure comprising robust estimation and residual truncation. Based on this procedure, we propose a residual-based robust CUSUM test and its self-normalized counterpart. We derive the limiting null distributions of the proposed robust tests and establish their consistency. Simulation results demonstrate the strong robustness of the tests against outliers. To illustrate the practical application, we analyze Bitcoin data.

28 pages, 1 figure
Integrated Prediction and Multi-period Portfolio Optimization 2025-12-15
Show

Multi-period portfolio optimization is important for real portfolio management, as it accounts for transaction costs, path-dependent risks, and the intertemporal structure of trading decisions that single-period models cannot capture. Classical methods usually follow a two-stage framework: machine learning algorithms are employed to produce forecasts that closely fit the realized returns, and the predicted values are then used in a downstream portfolio optimization problem to determine the asset weights. This separation leads to a fundamental misalignment between predictions and decision outcomes, while also ignoring the impact of transaction costs. To bridge this gap, recent studies have proposed the idea of end-to-end learning, integrating the two stages into a single pipeline. This paper introduces IPMO (Integrated Prediction and Multi-period Portfolio Optimization), a model for multi-period mean-variance portfolio optimization with turnover penalties. The predictor generates multi-period return forecasts that parameterize a differentiable convex optimization layer, which in turn drives learning via portfolio performance. For scalability, we introduce a mirror-descent fixed-point (MDFP) differentiation scheme that avoids factorizing the Karush-Kuhn-Tucker (KKT) systems, which thus yields stable implicit gradients and nearly scale-insensitive runtime as the decision horizon grows. In experiments with real market data and two representative time-series prediction models, the IPMO method consistently outperforms the two-stage benchmarks in risk-adjusted performance net of transaction costs and achieves more coherent allocation paths. Our results show that integrating machine learning prediction with optimization in the multi-period setting improves financial outcomes and remains computationally tractable.

23 pa...

23 pages, 6 figures, and 4 tables

Improving Time Series Forecasting via Instance-aware Post-hoc Revision 2025-12-15
Show

Time series forecasting plays a vital role in various real-world applications and has attracted significant attention in recent decades. While recent methods have achieved remarkable accuracy by incorporating advanced inductive biases and training strategies, we observe that instance-level variations remain a significant challenge. These variations--stemming from distribution shifts, missing data, and long-tail patterns--often lead to suboptimal forecasts for specific instances, even when overall performance appears strong. To address this issue, we propose a model-agnostic framework, PIR, designed to enhance forecasting performance through Post-forecasting Identification and Revision. Specifically, PIR first identifies biased forecasting instances by estimating their accuracy. Based on this, the framework revises the forecasts using contextual information, including covariates and historical time series, from both local and global perspectives in a post-processing fashion. Extensive experiments on real-world datasets with mainstream forecasting models demonstrate that PIR effectively mitigates instance-level errors and significantly improves forecasting reliability.

Accep...

Accepted by NeurIPS 2025 as a poster

Advancing Financial Engineering with Foundation Models: Progress, Applications, and Challenges 2025-12-15
Show

The advent of foundation models (FMs), large-scale pre-trained models with strong generalization capabilities, has opened new frontiers for financial engineering. While general-purpose FMs such as GPT-4 and Gemini have demonstrated promising performance in tasks ranging from financial report summarization to sentiment-aware forecasting, many financial applications remain constrained by unique domain requirements such as multimodal reasoning, regulatory compliance, and data privacy. These challenges have spurred the emergence of financial foundation models (FFMs): a new class of models explicitly designed for finance. This survey presents a comprehensive overview of FFMs, with a taxonomy spanning three key modalities: financial language foundation models (FinLFMs), financial time-series foundation models (FinTSFMs), and financial visual-language foundation models (FinVLFMs). We review their architectures, training methodologies, datasets, and real-world applications. Furthermore, we identify critical challenges in data availability, algorithmic scalability, and infrastructure constraints and offer insights into future research opportunities. We hope this survey can serve as both a comprehensive reference for understanding FFMs and a practical roadmap for future innovation.

Accep...

Accepted by [J]. Engineering, 2025

Next-generation reservoir computing validated by classification task 2025-12-15
Show

An emerging computing paradigm, so-called next-generation reservoir computing (NG-RC) is investigated. True to its namesake, NG-RC requires no actual reservoirs for input data mixing but rather computing the polynomial terms directly from the time series inputs. However, benchmark tests so far reported have been one-sided, limited to prediction tasks of temporal waveforms such as Lorenz 63 attractor and Mackey-Glass chaotic signal. We will demonstrate for the first time that NG-RC can perform classification task as good as conventional RC. This validates the versatile computational capability of NG-RC in tasks of both prediction and classification.

DARTs: A Dual-Path Robust Framework for Anomaly Detection in High-Dimensional Multivariate Time Series 2025-12-14
Show

Multivariate time series anomaly detection (MTSAD) aims to accurately identify and localize complex abnormal patterns in the large-scale industrial control systems. While existing approaches excel in recognizing the distinct patterns under the low-dimensional scenarios, they often fail to robustly capture long-range spatiotemporal dependencies when learning representations from the high-dimensional noisy time series. To address these limitations, we propose DARTs, a robust long short-term dual-path framework with window-aware spatiotemporal soft fusion mechanism, which can be primarily decomposed into three complementary components. Specifically, in the short-term path, we introduce a Multi-View Sparse Graph Learner and a Diffusion Multi-Relation Graph Unit that collaborate to adaptively capture hierarchical discriminative short-term spatiotemporal patterns in the high-noise time series. While in the long-term path, we design a Multi-Scale Spatiotemporal Graph Constructor to model salient long-term dynamics within the high-dimensional representation space. Finally, a window-aware spatiotemporal soft-fusion mechanism is introduced to filter the residual noise while seamlessly integrating anomalous patterns. Extensive qualitative and quantitative experimental results across mainstream datasets demonstrate the superiority and robustness of our proposed DARTs. A series of ablation studies are also conducted to explore the crucial design factors of our proposed components. Our code and model will be made publicly open soon.

Empirical Mode Decomposition and Graph Transformation of the MSCI World Index: A Multiscale Topological Analysis for Graph Neural Network Modeling 2025-12-14
Show

This study applies Empirical Mode Decomposition (EMD) to the MSCI World index and converts the resulting intrinsic mode functions (IMFs) into graph representations to enable modeling with graph neural networks (GNNs). Using CEEMDAN, we extract nine IMFs spanning high-frequency fluctuations to long-term trends. Each IMF is transformed into a graph using four time-series-to-graph methods: natural visibility, horizontal visibility, recurrence, and transition graphs. Topological analysis shows clear scale-dependent structure: high-frequency IMFs yield dense, highly connected small-world graphs, whereas low-frequency IMFs produce sparser networks with longer characteristic path lengths. Visibility-based methods are more sensitive to amplitude variability and typically generate higher clustering, while recurrence graphs better preserve temporal dependencies. These results provide guidance for designing GNN architectures tailored to the structural properties of decomposed components, supporting more effective predictive modeling of financial time series.

19 pa...

19 pages, 3 figures, 6 tables

Noise-robust Contrastive Learning for Critical Transition Detection in Dynamical Systems 2025-12-14
Show

Detecting critical transitions in complex, noisy time-series data is a fundamental challenge across science and engineering. Such transitions may be anticipated by the emergence of a low-dimensional order parameter, whose signature is often masked by high-amplitude stochastic variability. Standard contrastive learning approaches based on deep neural networks, while promising for detecting critical transitions, are often overparameterized and sensitive to irrelevant noise, leading to inaccurate identification of critical points. To address these limitations, we propose a neural network architecture, constructed using singular value decomposition technique, together with a strictly semi-orthogonality-constrained training algorithm, to enhance the performance of traditional contrastive learning. Extensive experiments demonstrate that the proposed method matches the performance of traditional contrastive learning techniques in identifying critical transitions, yet is considerably more lightweight and markedly more resistant to noise.

under revision

Trajectory

Title Date Abstract Comment
Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes 2025-12-19
Show

We present Planning as Descent (PaD), a framework for offline goal-conditioned reinforcement learning that grounds trajectory synthesis in verification. Instead of learning a policy or explicit planner, PaD learns a goal-conditioned energy function over entire latent trajectories, assigning low energy to feasible, goal-consistent futures. Planning is realized as gradient-based refinement in this energy landscape, using identical computation during training and inference to reduce train-test mismatch common in decoupled modeling pipelines. PaD is trained via self-supervised hindsight goal relabeling, shaping the energy landscape around the planning dynamics. At inference, multiple trajectory candidates are refined under different temporal hypotheses, and low-energy plans balancing feasibility and efficiency are selected. We evaluate PaD on OGBench cube manipulation tasks. When trained on narrow expert demonstrations, PaD achieves state-of-the-art 95% success, strongly outperforming prior methods that peak at 68%. Remarkably, training on noisy, suboptimal data further improves success and plan efficiency, highlighting the benefits of verification-driven planning. Our results suggest learning to evaluate and refine trajectories provides a robust alternative to direct policy learning for offline, reward-free planning.

Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics 2025-12-19
Show

Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global Positioning System traces cannot generally be shared with third parties owing to severe re-identification risks. Aggregated records, such as origin-destination (OD) matrices, offer partial insights but fail to capture the key behavioral properties of daily human movement, limiting realistic city-scale analyses. This study presents a privacy-preserving synthetic mobility dataset that reconstructs daily trajectories from aggregated inputs. The proposed method integrates OD flows with two complementary behavioral constraints: (1) dwell-travel time quantiles that are available only as coarse summary statistics and (2) the universal law for the daily distribution of the number of visited locations. Embedding these elements in a multi-objective optimization framework enables the reproduction of realistic distributions of human mobility while ensuring that no personal identifiers are required. The proposed framework is validated in two contrasting regions of Japan: (1) the 23 special wards of Tokyo, representing a dense metropolitan environment; and (2) Fukuoka Prefecture, where urban and suburban mobility patterns coexist. The resulting synthetic mobility data reproduce dwell-travel time and visit frequency distributions with high fidelity, while deviations in OD consistency remain within the natural range of daily fluctuations. The results of this study establish a practical synthesis pathway under real-world constraints, providing governments, urban planners, and industries with scalable access to high-resolution mobility data for reliable analytics without the need for sensitive personal records, and supporting practical deployments in policy and commercial domains.

9 pages, 4 figures
Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution 2025-12-18
Show

Diffusion models have transformed image synthesis through iterative denoising, by defining trajectories from noise to coherent data. While their capabilities are widely celebrated, a critical challenge remains unaddressed: ensuring responsible use by verifying whether an image originates from a model's training set, its novel generations or external sources. We introduce a framework that analyzes diffusion trajectories for this purpose. Specifically, we demonstrate that temporal dynamics across the entire trajectory allow for more robust classification and challenge the widely-adopted "Goldilocks zone" conjecture, which posits that membership inference is effective only within narrow denoising stages. More fundamentally, we expose critical flaws in current membership inference practices by showing that representative methods fail under distribution shifts or when model-generated data is present. For model attribution, we demonstrate a first white-box approach directly applicable to diffusion. Ultimately, we propose the unification of data provenance into a single, cohesive framework tailored to modern generative systems.

The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text 2025-12-18
Show

We present WorldCanvas, a framework for promptable world events that enables rich, user-directed simulation by combining text, trajectories, and reference images. Unlike text-only approaches and existing trajectory-controlled image-to-video methods, our multimodal approach combines trajectories -- encoding motion, timing, and visibility -- with natural language for semantic intent and reference images for visual grounding of object identity, enabling the generation of coherent, controllable events that include multi-agent interactions, object entry/exit, reference-guided appearance and counterintuitive events. The resulting videos demonstrate not only temporal coherence but also emergent consistency, preserving object identity and scene despite temporary disappearance. By supporting expressive world events generation, WorldCanvas advances world models from passive predictors to interactive, user-shaped simulators. Our project page is available at: https://worldcanvas.github.io/.

Proje...

Project page and code: https://worldcanvas.github.io/

Flowing from Reasoning to Motion: Learning 3D Hand Trajectory Prediction from Egocentric Human Interaction Videos 2025-12-18
Show

Prior works on 3D hand trajectory prediction are constrained by datasets that decouple motion from semantic supervision and by models that weakly link reasoning and action. To address these, we first present the EgoMAN dataset, a large-scale egocentric dataset for interaction stage-aware 3D hand trajectory prediction with 219K 6DoF trajectories and 3M structured QA pairs for semantic, spatial, and motion reasoning. We then introduce the EgoMAN model, a reasoning-to-motion framework that links vision-language reasoning and motion generation via a trajectory-token interface. Trained progressively to align reasoning with motion dynamics, our approach yields accurate and stage-aware trajectories with generalization across real-world scenes.

Proje...

Project website: https://egoman-project.github.io

R3ST: A Synthetic 3D Dataset With Realistic Trajectories 2025-12-18
Show

Datasets are essential to train and evaluate computer vision models used for traffic analysis and to enhance road safety. Existing real datasets fit real-world scenarios, capturing authentic road object behaviors, however, they typically lack precise ground-truth annotations. In contrast, synthetic datasets play a crucial role, allowing for the annotation of a large number of frames without additional costs or extra time. However, a general drawback of synthetic datasets is the lack of realistic vehicle motion, since trajectories are generated using AI models or rule-based systems. In this work, we introduce R3ST (Realistic 3D Synthetic Trajectories), a synthetic dataset that overcomes this limitation by generating a synthetic 3D environment and integrating real-world trajectories derived from SinD, a bird's-eye-view dataset recorded from drone footage. The proposed dataset closes the gap between synthetic data and realistic trajectories, advancing the research in trajectory forecasting of road vehicles, offering both accurate multimodal ground-truth annotations and authentic human-driven vehicle trajectories.

dLITE: Differentiable Lighting-Informed Trajectory Evaluation for On-Orbit Inspection 2025-12-17
Show

Visual inspection of space-borne assets is of increasing interest to spacecraft operators looking to plan maintenance, characterise damage, and extend the life of high-value satellites in orbit. The environment of Low Earth Orbit (LEO) presents unique challenges when planning inspection operations that maximise visibility, information, and data quality. Specular reflection of sunlight from spacecraft bodies, self-shadowing, and dynamic lighting in LEO significantly impact the quality of data captured throughout an orbit. This is exacerbated by the relative motion between spacecraft, which introduces variable imaging distances and attitudes during inspection. Planning inspection trajectories with the aide of simulation is a common approach. However, the ability to design and optimise an inspection trajectory specifically to improve the resulting image quality in proximity operations remains largely unexplored. In this work, we present $\partial$LITE, an end-to-end differentiable simulation pipeline for on-orbit inspection operations. We leverage state-of-the-art differentiable rendering tools and a custom orbit propagator to enable end-to-end optimisation of orbital parameters based on visual sensor data. $\partial$LITE enables us to automatically design non-obvious trajectories, vastly improving the quality and usefulness of attained data. To our knowledge, our differentiable inspection-planning pipeline is the first of its kind and provides new insights into modern computational approaches to spacecraft mission planning. Project page: https://appearance-aware.github.io/dlite/

13 pages, 9 images
MovSemCL: Movement-Semantics Contrastive Learning for Trajectory Similarity (Extension) 2025-12-17
Show

Trajectory similarity computation is fundamental functionality that is used for, e.g., clustering, prediction, and anomaly detection. However, existing learning-based methods exhibit three key limitations: (1) insufficient modeling of trajectory semantics and hierarchy, lacking both movement dynamics extraction and multi-scale structural representation; (2) high computational costs due to point-wise encoding; and (3) use of physically implausible augmentations that distort trajectory semantics. To address these issues, we propose MovSemCL, a movement-semantics contrastive learning framework for trajectory similarity computation. MovSemCL first transforms raw GPS trajectories into movement-semantics features and then segments them into patches. Next, MovSemCL employs intra- and inter-patch attentions to encode local as well as global trajectory patterns, enabling efficient hierarchical representation and reducing computational costs. Moreover, MovSemCL includes a curvature-guided augmentation strategy that preserves informative segments (e.g., turns and intersections) and masks redundant ones, generating physically plausible augmented views. Experiments on real-world datasets show that MovSemCL is capable of outperforming state-of-the-art methods, achieving mean ranks close to the ideal value of 1 at similarity search tasks and improvements by up to 20.3% at heuristic approximation, while reducing inference latency by up to 43.4%.

8 pag...

8 pages, 6 figures; accepted by AAAI 2026 as an Oral paper

Expand and Prune: Maximizing Trajectory Diversity for Effective GRPO in Generative Models 2025-12-17
Show

Group Relative Policy Optimization (GRPO) is a powerful technique for aligning generative models, but its effectiveness is bottlenecked by the conflict between large group sizes and prohibitive computational costs. In this work, we investigate the trade-off through empirical studies, yielding two key observations. First, we discover the reward clustering phenomenon in which many trajectories collapse toward the group-mean reward, offering limited optimization value. Second, we design a heuristic strategy named Optimal Variance Filtering (OVF), and verify that a high-variance subset of trajectories, selected by OVF can outperform the larger, unfiltered group. However, this static, post-sampling OVF approach still necessitates critical computational overhead, as it performs unnecessary sampling for trajectories that are ultimately discarded. To resolve this, we propose Pro-GRPO (Proactive GRPO), a novel dynamic framework that integrates latent feature-based trajectory pruning into the sampling process. Through the early termination of reward-clustered trajectories, Pro-GRPO reduces computational overhead. Leveraging its efficiency, Pro-GRPO employs an "Expand-and-Prune" strategy. This strategy first expands the size of initial sampling group to maximize trajectory diversity, then it applies multi-step OVF to the latents, avoiding prohibitive computational costs. Extensive experiments on both diffusion-based and flow-based models demonstrate the generality and effectiveness of our Pro-GRPO framework.

10 pages, 5 figures
SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model 2025-12-17
Show

This paper introduces a novel architecture for trajectory-conditioned forecasting of future 3D scene occupancy. In contrast to methods that rely on variational autoencoders (VAEs) to generate discrete occupancy tokens, which inherently limit representational capacity, our approach predicts multi-frame future occupancy in an end-to-end manner directly from raw image features. Inspired by the success of attention-based transformer architectures in foundational vision and language models such as GPT and VGGT, we employ a sparse occupancy representation that bypasses the intermediate bird's eye view (BEV) projection and its explicit geometric priors. This design allows the transformer to capture spatiotemporal dependencies more effectively. By avoiding both the finite-capacity constraint of discrete tokenization and the structural limitations of BEV representations, our method achieves state-of-the-art performance on the nuScenes benchmark for 1-3 second occupancy forecasting, outperforming existing approaches by a significant margin. Furthermore, it demonstrates robust scene dynamics understanding, consistently delivering high accuracy under arbitrary future trajectory conditioning.

TrajSyn: Privacy-Preserving Dataset Distillation from Federated Model Trajectories for Server-Side Adversarial Training 2025-12-17
Show

Deep learning models deployed on edge devices are increasingly used in safety-critical applications. However, their vulnerability to adversarial perturbations poses significant risks, especially in Federated Learning (FL) settings where identical models are distributed across thousands of clients. While adversarial training is a strong defense, it is difficult to apply in FL due to strict client-data privacy constraints and the limited compute available on edge devices. In this work, we introduce TrajSyn, a privacy-preserving framework that enables effective server-side adversarial training by synthesizing a proxy dataset from the trajectories of client model updates, without accessing raw client data. We show that TrajSyn consistently improves adversarial robustness on image classification benchmarks with no extra compute burden on the client device.

QoS-Aware Hierarchical Reinforcement Learning for Joint Link Selection and Trajectory Optimization in SAGIN-Supported UAV Mobility Management 2025-12-17
Show

Due to the significant variations in unmanned aerial vehicle (UAV) altitude and horizontal mobility, it becomes difficult for any single network to ensure continuous and reliable threedimensional coverage. Towards that end, the space-air-ground integrated network (SAGIN) has emerged as an essential architecture for enabling ubiquitous UAV connectivity. To address the pronounced disparities in coverage and signal characteristics across heterogeneous networks, this paper formulates UAV mobility management in SAGIN as a constrained multi-objective joint optimization problem. The formulation couples discrete link selection with continuous trajectory optimization. Building on this, we propose a two-level multi-agent hierarchical deep reinforcement learning (HDRL) framework that decomposes the problem into two alternately solvable subproblems. To map complex link selection decisions into a compact discrete action space, we conceive a double deep Q-network (DDQN) algorithm in the top-level, which achieves stable and high-quality policy learning through double Q-value estimation. To handle the continuous trajectory action space while satisfying quality of service (QoS) constraints, we integrate the maximum-entropy mechanism of the soft actor-critic (SAC) and employ a Lagrangian-based constrained SAC (CSAC) algorithm in the lower-level that dynamically adjusts the Lagrange multipliers to balance constraint satisfaction and policy optimization. Moreover, the proposed algorithm can be extended to multi-UAV scenarios under the centralized training and decentralized execution (CTDE) paradigm, which enables more generalizable policies. Simulation results demonstrate that the proposed scheme substantially outperforms existing benchmarks in throughput, link switching frequency and QoS satisfaction.

This ...

This work has been submitted to the IEEE for possible publication

Beyond Proximity: A Keypoint-Trajectory Framework for Classifying Affiliative and Agonistic Social Networks in Dairy Cattle 2025-12-17
Show

Precision livestock farming requires objective assessment of social behavior to support herd welfare monitoring, yet most existing approaches infer interactions using static proximity thresholds that cannot distinguish affiliative from agonistic behaviors in complex barn environments. This limitation constrains the interpretability of automated social network analysis in commercial settings. We present a pose-based computational framework for interaction classification that moves beyond proximity heuristics by modeling the spatiotemporal geometry of anatomical keypoints. Rather than relying on pixel-level appearance or simple distance measures, the proposed method encodes interaction-specific motion signatures from keypoint trajectories, enabling differentiation of social interaction valence. The framework is implemented as an end-to-end computer vision pipeline integrating YOLOv11 for object detection (mAP@0.50: 96.24%), supervised individual identification (98.24% accuracy), ByteTrack for multi-object tracking (81.96% accuracy), ZebraPose for 27-point anatomical keypoint estimation, and a support vector machine classifier trained on pose-derived distance dynamics. On annotated interaction clips collected from a commercial dairy barn, the classifier achieved 77.51% accuracy in distinguishing affiliative and agonistic behaviors using pose information alone. Comparative evaluation against a proximity-only baseline shows substantial gains in behavioral discrimination, particularly for affiliative interactions. The results establish a proof-of-concept for automated, vision-based inference of social interactions suitable for constructing interaction-aware social networks, with near-real-time performance on commodity hardware.

36 pa...

36 pages, 12 figures, 8 tables

ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts 2025-12-16
Show

Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challenges for reward propagation, resulting in inaccurate value estimation and degraded policy performance. While trajectory stitching via generative models offers a promising solution, existing augmentation methods frequently produce trajectories that are either confined to the support of the behavior policy or violate the underlying dynamics, thereby limiting their effectiveness for policy improvement. We propose ASTRO, a data augmentation framework that generates distributionally novel and dynamics-consistent trajectories for offline RL. ASTRO first learns a temporal-distance representation to identify distinct and reachable stitch targets. We then employ a dynamics-guided stitch planner that adaptively generates connecting action sequences via Rollout Deviation Feedback, defined as the gap between target state sequence and the actual arrived state sequence by executing predicted actions, to improve trajectory stitching's feasibility and reachability. This approach facilitates effective augmentation through stitching and ultimately enhances policy learning. ASTRO outperforms prior offline RL augmentation methods across various algorithms, achieving notable performance gain on the challenging OGBench suite and demonstrating consistent improvements on standard offline RL benchmarks such as D4RL.

Recent Advances in Multi-Agent Human Trajectory Prediction: A Comprehensive Review 2025-12-16
Show

With the emergence of powerful data-driven methods in human trajectory prediction (HTP), gaining a finer understanding of multi-agent interactions lies within hand's reach, with important implications in areas such as social robot navigation, autonomous navigation, and crowd modeling. This survey reviews some of the most recent advancements in deep learning-based multi-agent trajectory prediction, focusing on studies published between 2020 and 2025. We categorize the existing methods based on their architectural design, their input representations, and their overall prediction strategies, placing a particular emphasis on models evaluated using the ETH/UCY benchmark. Furthermore, we highlight key challenges and future research directions in the field of multi-agent HTP.

45 pages
DRAW2ACT: Turning Depth-Encoded Trajectories into Robotic Demonstration Videos 2025-12-16
Show

Video diffusion models provide powerful real-world simulators for embodied AI but remain limited in controllability for robotic manipulation. Recent works on trajectory-conditioned video generation address this gap but often rely on 2D trajectories or single modality conditioning, which restricts their ability to produce controllable and consistent robotic demonstrations. We present DRAW2ACT, a depth-aware trajectory-conditioned video generation framework that extracts multiple orthogonal representations from the input trajectory, capturing depth, semantics, shape and motion, and injects them into the diffusion model. Moreover, we propose to jointly generate spatially aligned RGB and depth videos, leveraging cross-modality attention mechanisms and depth supervision to enhance the spatio-temporal consistency. Finally, we introduce a multimodal policy model conditioned on the generated RGB and depth sequences to regress the robot's joint angles. Experiments on Bridge V2, Berkeley Autolab, and simulation benchmarks show that DRAW2ACT achieves superior visual fidelity and consistency while yielding higher manipulation success rates compared to existing baselines.

Trajectory Tracking for Multi-Manipulator Systems in Constrained Environments 2025-12-16
Show

We consider the problem of cooperative manipulation by a mobile multi-manipulator system operating in obstacle-cluttered and highly constrained environments under spatio-temporal task specifications. The task requires transporting a grasped object while respecting both continuous robot dynamics and discrete geometric constraints arising from obstacles and narrow passages. To address this hybrid structure, we propose a multi-rate planning and control framework that combines offline generation of an STL-satisfying object trajectory and collision-free base footprints with online constrained inverse kinematics and continuous-time feedback control. The resulting closed-loop system enables coordinated reconfiguration of multiple manipulators while tracking the desired object motion. The approach is evaluated in high-fidelity physics simulations using three Franka Emika Panda mobile manipulators rigidly grasping an object.

DTRec: Learning Dynamic Reasoning Trajectories for Sequential Recommendation 2025-12-16
Show

Inspired by advances in LLMs, reasoning-enhanced sequential recommendation performs multi-step deliberation before making final predictions, unlocking greater potential for capturing user preferences. However, current methods are constrained by static reasoning trajectories that are ill-suited for the diverse complexity of user behaviors. They suffer from two key limitations: (1) a static reasoning direction, which uses flat supervision signals misaligned with human-like hierarchical reasoning, and (2) a fixed reasoning depth, which inefficiently applies the same computational effort to all users, regardless of pattern complexity. These rigidity lead to suboptimal performance and significant computational waste. To overcome these challenges, we propose DTRec, a novel and effective framework that explores the Dynamic reasoning Trajectory for Sequential Recommendation along both direction and depth. To guide the direction, we develop Hierarchical Process Supervision (HPS), which provides coarse-to-fine supervisory signals to emulate the natural, progressive refinement of human cognitive processes. To optimize the depth, we introduce the Adaptive Reasoning Halting (ARH) mechanism that dynamically adjusts the number of reasoning steps by jointly monitoring three indicators. Extensive experiments on three real-world datasets demonstrate the superiority of our approach, achieving up to a 24.5% performance improvement over strong baselines while simultaneously reducing computational cost by up to 41.6%.

Under Review
Stable Trajectory Clustering: An Efficient Split and Merge Algorithm 2025-12-15
Show

Clustering algorithms fundamentally group data points by characteristics to identify patterns. Over the past two decades, researchers have extended these methods to analyze trajectories of humans, animals, and vehicles, studying their behavior and movement across applications. \noindent This paper presents whole-trajectory clustering and sub-trajectory clustering algorithms based on DBSCAN line segment clustering, which encompasses two key events: split and merge of line segments. The events are utilized to capture object movement history based on the average Euclidean distance between line segments. In this framework, whole-trajectory clustering considers entire entities' trajectories, whereas sub-trajectory clustering employs a sliding window model to identify local similarity patterns. Many existing trajectory clustering algorithms respond to temporary anomalies in data by splitting trajectories, which often obscures otherwise consistent clustering patterns and leads to less reliable insights. To address this, we introduce the stable trajectory clustering algorithm, which leverages the mean absolute deviation concept to demonstrate that selective omission of transient deviations not only preserves the integrity of clusters but also improves their stability and interpretability. We evaluate all proposed algorithms on real trajectory datasets to illustrate their effectiveness and sensitivity to parameter variations.

Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation 2025-12-15
Show

This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs), with the goal of improving computational speed on parallel architectures. The MAP estimation problem is reformulated as a continuous-time optimal control problem based on the Onsager-Machlup functional. This reformulation enables the use of a previously proposed parallel-in-time solution for optimal control problems, which we adapt to the current problem. The structure of the resulting optimal control problem admits a parallel solution based on parallel associative scan algorithms. In the linear Gaussian special case, it yields a parallel Kalman-Bucy filter and a parallel continuous-time Rauch-Tung-Striebel smoother. These linear computational methods are further extended to nonlinear continuous-time state-space models through Taylor expansions. We also present the corresponding parallel two-filter smoother. The graphics processing unit (GPU) experiments on linear and nonlinear models demonstrate that the proposed framework achieves a significant speedup in computations while maintaining the accuracy of sequential algorithms.

Humanoid Robot Running Through Random Stepping Stones and Jumping Over Obstacles: Step Adaptation Using Spring-Mass Trajectories 2025-12-15
Show

This study proposes a step adaptation framework for running through spring-mass trajectories and deadbeat control gain libraries. It includes four main parts: (1) Automatic spring-mass trajectory library generation; (2) Deadbeat control gain library generation through an actively controlled template model that resembles the whole-body dynamics well; (3) Trajectory selection policy development for step adaptation; (4) Mapping spring-mass trajectories to a humanoid model through a whole-body control (WBC) framework also accounting for closed-kinematic chain systems, self collisions, and reactive limb swinging. We show the inclusiveness and the robustness of the proposed framework through various challenging and agile behaviors such as running through randomly generated stepping stones, jumping over random obstacles, performing slalom motions, changing the running direction suddenly with a random leg, and rejecting significant disturbances and uncertainties through the MuJoCo physics simulator. We also perform additional simulations under a comprehensive set of uncertainties and noise to better justify the proposed method's robustness to real-world challenges, including signal noise, imprecision, modeling errors, and delays. All the aforementioned behaviors are performed with a single library and the same set of WBC control parameters without additional tuning. The spring-mass and the deadbeat control gain library are automatically computed in 4.5 seconds in total for 315 different trajectories.

Accep...

Accepted for publication in Biomimetic Intelligence and Robotics. Supplemental video: https://youtu.be/HlAg2nbNct4

WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory 2025-12-15
Show

The Automatic Identification System (AIS) enables data-driven maritime surveillance but suffers from reliability issues and irregular intervals. We address vessel destination estimation using global-scope AIS data by proposing a differentiated approach that recasts long port-to-port trajectories as a nested sequence structure. Using spatial grids, this method mitigates spatio-temporal bias while preserving detailed resolution. We introduce a novel deep learning architecture, WAY, designed to process these reformulated trajectories for long-term destination estimation days to weeks in advance. WAY comprises a trajectory representation layer and Channel-Aggregative Sequential Processing (CASP) blocks. The representation layer generates multi-channel vector sequences from kinematic and non-kinematic features. CASP blocks utilize multi-headed channel- and self-attention for aggregation and sequential information delivery. Additionally, we propose a task-specialized Gradient Dropout (GD) technique to enable many-to-many training on single labels, preventing biased feedback surges by stochastically blocking gradient flow based on sample length. Experiments on 5-year AIS data demonstrate WAY's superiority over conventional spatial grid-based approaches regardless of trajectory progression. Results further confirm that adopting GD leads to performance gains. Finally, we explore WAY's potential for real-world application through multitask learning for ETA estimation.

Accep...

Accepted to IEEE Transactions on Aerospace and Electronic Systems (TAES)

Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning 2025-12-15
Show

Federated Learning (FL) enables collaborative training while preserving privacy, yet Gradient Inversion Attacks (GIAs) pose severe threats by reconstructing private data from shared gradients. In realistic FedAvg scenarios with multi-step updates, existing surrogate methods like SME rely on linear interpolation to approximate client trajectories for privacy leakage. However, we demonstrate that linear assumptions fundamentally underestimate SGD's nonlinear complexity, encountering irreducible approximation barriers in non-convex landscapes with only one-dimensional expressiveness. We propose Non-Linear Surrogate Model Extension (NL-SME), the first framework introducing learnable quadratic Bézier curves for trajectory modeling in GIAs against FL. NL-SME leverages $

w
Multi-Trajectory Physics-Informed Neural Networks for HJB Equations with Hard-Zero Terminal Inventory: Optimal Execution on Synthetic & SPY Data 2025-12-14
Show

We study optimal trade execution with a hard-zero terminal inventory constraint, modeled via Hamilton-Jacobi-Bellman (HJB) equations. Vanilla PINNs often under-enforce this constraint and produce unstable controls. We propose a Multi-Trajectory PINN (MT-PINN) that adds a rollout-based trajectory loss and propagates a terminal penalty on terminal inventory via backpropagation-through-time, directly enforcing zero terminal inventory. A lightweight lambda-curriculum is adopted to stabilize training as the state expands from a risk-neutral reduced HJB to a risk-averse HJB. On the Gatheral-Schied single-asset model, MT-PINN aligns closely with their derived closed-form solutions and concentrates terminal inventory tightly around zero while reducing errors along optimal paths. We apply MT-PINNs on SPY intraday data, matching TWAP when risk-neutral, and achieving lower exposure and competitive costs, especially in falling windows, for higher risk-aversion.

24 pa...

24 pages, 19 figures. Accepted to the NeurIPS 2025 Workshop on Generative AI in Finance

Coherent Multi-Agent Trajectory Forecasting in Team Sports with CausalTraj 2025-12-14
Show

Jointly forecasting trajectories of multiple interacting agents is a core challenge in sports analytics and other domains involving complex group dynamics. Accurate prediction enables realistic simulation and strategic understanding of gameplay evolution. Most existing models are evaluated solely on per-agent accuracy metrics (minADE, minFDE), which assess each agent independently on its best-of-k prediction. However these metrics overlook whether the model learns which predicted trajectories can jointly form a plausible multi-agent future. Many state-of-the-art models are designed and optimized primarily based on these metrics. As a result, they may underperform on joint predictions and also fail to generate coherent, interpretable multi-agent scenarios in team sports. We propose CausalTraj, a temporally causal, likelihood-based model that is built to generate jointly probable multi-agent trajectory forecasts. To better assess collective modeling capability, we emphasize joint metrics (minJADE, minJFDE) that measure joint accuracy across agents within the best generated scenario sample. Evaluated on the NBA SportVU, Basketball-U, and Football-U datasets, CausalTraj achieves competitive per-agent accuracy and the best recorded results on joint metrics, while yielding qualitatively coherent and realistic gameplay evolutions.

9 pag...

9 pages, 3 figures, accepted to the AI4TS Workshop at AAAI 2026

PoreTrack3D: A Benchmark for Dynamic 3D Gaussian Splatting in Pore-Scale Facial Trajectory Tracking 2025-12-14
Show

We introduce PoreTrack3D, the first benchmark for dynamic 3D Gaussian splatting in pore-scale, non-rigid 3D facial trajectory tracking. It contains over 440,000 facial trajectories in total, among which more than 52,000 are longer than 10 frames, including 68 manually reviewed trajectories that span the entire 150 frames. To the best of our knowledge, PoreTrack3D is the first benchmark dataset to capture both traditional facial landmarks and pore-scale keypoints trajectory, advancing the study of fine-grained facial expressions through the analysis of subtle skin-surface motion. We systematically evaluate state-of-the-art dynamic 3D Gaussian splatting methods on PoreTrack3D, establishing the first performance baseline in this domain. Overall, the pipeline developed for this benchmark dataset's creation establishes a new framework for high-fidelity facial motion capture and dynamic 3D reconstruction. Our dataset are publicly available at: https://github.com/JHXion9/PoreTrack3D

Measuring What Matters: Scenario-Driven Evaluation for Trajectory Predictors in Autonomous Driving 2025-12-13
Show

Being able to anticipate the motion of surrounding agents is essential for the safe operation of autonomous driving systems in dynamic situations. While various methods have been proposed for trajectory prediction, the current evaluation practices still rely on error-based metrics (e.g., ADE, FDE), which reveal the accuracy from a post-hoc view but ignore the actual effect the predictor brings to the self-driving vehicles (SDVs), especially in complex interactive scenarios: a high-quality predictor not only chases accuracy, but should also captures all possible directions a neighbor agent might move, to support the SDVs' cautious decision-making. Given that the existing metrics hardly account for this standard, in our work, we propose a comprehensive pipeline that adaptively evaluates the predictor's performance by two dimensions: accuracy and diversity. Based on the criticality of the driving scenario, these two dimensions are dynamically combined and result in a final score for the predictor's performance. Extensive experiments on a closed-loop benchmark using real-world datasets show that our pipeline yields a more reasonable evaluation than traditional metrics by better reflecting the correlation of the predictors' evaluation with the autonomous vehicles' driving performance. This evaluation pipeline shows a robust way to select a predictor that potentially contributes most to the SDV's driving performance.

9 Pages, 8 Figures
STITCHER: Constrained Trajectory Planning in Complex Environments with Real-Time Motion Primitive Search 2025-12-12
Show

Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy state or actuator constraints. Modern trajectory planning techniques primarily use numerical optimization, as they enable the systematic computation of high-quality, expressive trajectories that satisfy various constraints. However, stringent requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical scenarios. This work presents an optimization-free planning framework called STITCHER that stitches short trajectory segments together with graph search to compute long-range, expressive, and near-optimal trajectories in real-time. STITCHER outperforms modern optimization-based planners through our innovative planning architecture and several algorithmic developments that make real-time planning possible. Extensive simulation testing is performed to analyze the algorithmic components that make up STITCHER, along with a thorough comparison with two state-of-the-art optimization planners. Simulation tests show that safe trajectories can be created within a few milliseconds for paths that span the entirety of two 50 m x 50 m environments. Hardware tests with a custom quadrotor verify that STITCHER can produce trackable paths in real-time while respecting nonconvex constraints, such as limits on tilt angle and motor forces, which are otherwise hard to include in optimization-based planners.

STITCHER: Real-Time Trajectory Planning with Motion Primitive Search 2025-12-12
Show

Autonomous high-speed navigation through large, complex environments requires real-time generation of agile trajectories that are dynamically feasible, collision-free, and satisfy constraints. Most modern trajectory planning techniques rely on numerical optimization because high-quality, expressive trajectories that satisfy constraints can be systematically computed. However, strict requirements on computation time and the risk of numerical instability can limit the use of optimization-based planners in safety-critical situations. This work presents an optimization-free planning framework called STITCHER that leverages graph search to generate long-range trajectories by stitching short trajectory segments together in real time. STITCHER is shown to outperform modern optimization-based planners through its innovative planning architecture and several algorithmic developments that make real-time planning possible. Simulation results show safe trajectories through complex environments can be generated in milliseconds that cover tens of meters.

Capacitive Touchscreens at Risk: Recovering Handwritten Trajectory on Smartphone via Electromagnetic Emanations 2025-12-12
Show

This paper reveals and exploits a critical security vulnerability: the electromagnetic (EM) side channel of capacitive touchscreens leaks sufficient information to recover fine-grained, continuous handwriting trajectories. We present Touchscreen Electromagnetic Side-channel Leakage Attack (TESLA), a non-contact attack framework that captures EM signals generated during on-screen writing and regresses them into two-dimensional (2D) handwriting trajectories in real time. Extensive evaluations across a variety of commercial off-the-shelf (COTS) smartphones show that TESLA achieves 77% character recognition accuracy and a Jaccard index of 0.74, demonstrating its capability to recover highly recognizable motion trajectories that closely resemble the original handwriting under realistic attack conditions.

Rethinking Expert Trajectory Utilization in LLM Post-training 2025-12-12
Show

While effective post-training integrates Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), the optimal mechanism for utilizing expert trajectories remains unresolved. We propose the Plasticity-Ceiling Framework to theoretically ground this landscape, decomposing performance into foundational SFT performance and the subsequent RL plasticity. Through extensive benchmarking, we establish the Sequential SFT-then-RL pipeline as the superior standard, overcoming the stability deficits of synchronized approaches. Furthermore, we derive precise scaling guidelines: (1) Transitioning to RL at the SFT Stable or Mild Overfitting Sub-phase maximizes the final ceiling by securing foundational SFT performance without compromising RL plasticity; (2) Refuting ``Less is More'' in the context of SFT-then-RL scaling, we demonstrate that Data Scale determines the primary post-training potential, while Trajectory Difficulty acts as a performance multiplier; and (3) Identifying that the Minimum SFT Validation Loss serves as a robust indicator for selecting the expert trajectories that maximize the final performance ceiling. Our findings provide actionable guidelines for maximizing the value extracted from expert trajectories.

24 pa...

24 pages, 5 figures, under review

Stabilising Learner Trajectories: A Doubly Robust Evaluation of AI-Guided Student Support using Activity Theory 2025-12-11
Show

While predictive models are increasingly common in higher education, causal evidence regarding the interventions they trigger remains rare. This study evaluates an AI-guided student support system at a large university using doubly robust propensity score matching. We advance the methodology for learning analytics evaluation by leveraging time-aligned, dynamic AI probability of success scores to match 1,859 treated students to controls, thereby mitigating the selection and immortal time biases often overlooked in observational studies. Results indicate that the intervention effectively stabilised precarious trajectories, and compared to the control group, supported students significantly reduced their course failure rates and achieved higher cumulative grades. However, effects on the speed of qualification completion were positive but statistically constrained. We interpreted these findings through Activity Theory, framing the intervention as a socio-technical brake that interrupts and slows the accumulation of academic failure among at-risk students. The student support-AI configuration successfully resolved the primary contradiction of immediate academic risk, but secondary contradictions within institutional structures limited the acceleration of degree completion. We conclude that while AI-enabled support effectively arrests decline, translating this stability into faster progression requires aligning intervention strategies with broader institutional governance.

Targeted Data Protection for Diffusion Model by Matching Training Trajectory 2025-12-11
Show

Recent advancements in diffusion models have made fine-tuning text-to-image models for personalization increasingly accessible, but have also raised significant concerns regarding unauthorized data usage and privacy infringement. Current protection methods are limited to passively degrading image quality, failing to achieve stable control. While Targeted Data Protection (TDP) offers a promising paradigm for active redirection toward user-specified target concepts, existing TDP attempts suffer from poor controllability due to snapshot-matching approaches that fail to account for complete learning dynamics. We introduce TAFAP (Trajectory Alignment via Fine-tuning with Adversarial Perturbations), the first method to successfully achieve effective TDP by controlling the entire training trajectory. Unlike snapshot-based methods whose protective influence is easily diluted as training progresses, TAFAP employs trajectory-matching inspired by dataset distillation to enforce persistent, verifiable transformations throughout fine-tuning. We validate our method through extensive experiments, demonstrating the first successful targeted transformation in diffusion models with simultaneous control over both identity and visual patterns. TAFAP significantly outperforms existing TDP attempts, achieving robust redirection toward target concepts while maintaining high image quality. This work enables verifiable safeguards and provides a new framework for controlling and tracing alterations in diffusion model outputs.

AAAI 2026
Dynamics of Agentic Loops in Large Language Models: A Geometric Theory of Trajectories 2025-12-11
Show

Agentic systems built on large language models operate through recursive feedback loops, where each output becomes the next input. Yet the geometric behavior of these agentic loops (whether they converge, diverge, or exhibit more complex dynamics) remains poorly understood. This paper introduces a geometric framework for analyzing agentic trajectories in semantic embedding space, treating iterative transformations as discrete dynamical systems. We distinguish the artifact space, where linguistic transformations occur, from the embedding space, where geometric measurements are performed. Because cosine similarity is biased by embedding anisotropy, we introduce an isotonic calibration that eliminates systematic bias and aligns similarities with human semantic judgments while preserving high local stability. This enables rigorous measurement of trajectories, clusters and attractors. Through controlled experiments on singular agentic loops, we identify two fundamental regimes. A contractive rewriting loop converges toward a stable attractor with decreasing dispersion, while an exploratory summarize and negate loop produces unbounded divergence with no cluster formation. These regimes display qualitatively distinct geometric signatures of contraction and expansion. Our results show that prompt design directly governs the dynamical regime of an agentic loop, enabling systematic control of convergence, divergence and trajectory structure in iterative LLM transformations.

Semantic Trajectory Generation for Goal-Oriented Spacecraft Rendezvous 2025-12-11
Show

Reliable real-time trajectory generation is essential for future autonomous spacecraft. While recent progress in nonconvex guidance and control is paving the way for onboard autonomous trajectory optimization, these methods still rely on extensive expert input (e.g., waypoints, constraints, mission timelines, etc.), which limits the operational scalability in real rendezvous missions. This paper introduces SAGES (Semantic Autonomous Guidance Engine for Space), a trajectory-generation framework that translates natural-language commands into spacecraft trajectories that reflect high-level intent while respecting nonconvex constraints. Experiments in two settings -- fault-tolerant proximity operations with continuous-time constraint enforcement and a free-flying robotic platform -- demonstrate that SAGES reliably produces trajectories aligned with human commands, achieving over 90% semantic-behavioral consistency across diverse behavior modes. Ultimately, this work marks an initial step toward language-conditioned, constraint-aware spacecraft trajectory generation, enabling operators to interactively guide both safety and behavior through intuitive natural-language commands with reduced expert burden.

28 pa...

28 pages, 12 figures. Submitted to AIAA SCITECH 2026

FAST: Foreground-aware Diffusion with Accelerated Sampling Trajectory for Segmentation-oriented Anomaly Synthesis 2025-12-11
Show

Industrial anomaly segmentation relies heavily on pixel-level annotations, yet real-world anomalies are often scarce, diverse, and costly to label. Segmentation-oriented industrial anomaly synthesis (SIAS) has emerged as a promising alternative; however, existing methods struggle to balance sampling efficiency and generation quality. Moreover, most approaches treat all spatial regions uniformly, overlooking the distinct statistical differences between anomaly and background areas. This uniform treatment hinders the synthesis of controllable, structure-specific anomalies tailored for segmentation tasks. In this paper, we propose FAST, a foreground-aware diffusion framework featuring two novel modules: the Anomaly-Informed Accelerated Sampling (AIAS) and the Foreground-Aware Reconstruction Module (FARM). AIAS is a training-free sampling algorithm specifically designed for segmentation-oriented industrial anomaly synthesis, which accelerates the reverse process through coarse-to-fine aggregation and enables the synthesis of state-of-the-art segmentation-oriented anomalies in as few as 10 steps. Meanwhile, FARM adaptively adjusts the anomaly-aware noise within the masked foreground regions at each sampling step, preserving localized anomaly signals throughout the denoising trajectory. Extensive experiments on multiple industrial benchmarks demonstrate that FAST consistently outperforms existing anomaly synthesis methods in downstream segmentation tasks. We release the code at: https://github.com/Chhro123/fast-foreground-aware-anomaly-synthesis.

Latent Action World Models for Control with Unlabeled Trajectories 2025-12-10
Show

Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely on action-conditioned trajectories, which limits effectiveness when action labels are scarce. We introduce a family of latent-action world models that jointly use action-conditioned and action-free data by learning a shared latent action representation. This latent space aligns observed control signals with actions inferred from passive observations, enabling a single dynamics model to train on large-scale unlabeled trajectories while requiring only a small set of action-labeled ones. We use the latent-action world model to learn a latent-action policy through offline reinforcement learning (RL), thereby bridging two traditionally separate domains: offline RL, which typically relies on action-conditioned data, and action-free training, which is rarely used with subsequent RL. On the DeepMind Control Suite, our approach achieves strong performance while using about an order of magnitude fewer action-labeled samples than purely action-conditioned baselines. These results show that latent actions enable training on both passive and interactive data, which makes world models learn more efficiently.

VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification 2025-12-10
Show

Synthesizing realistic human-object interactions (HOI) in video is challenging due to the complex, instance-specific interaction dynamics of both humans and objects. Incorporating controllability in video generation further adds to the complexity. Existing controllable video generation approaches face a trade-off: sparse controls like keypoint trajectories are easy to specify but lack instance-awareness, while dense signals such as optical flow, depths or 3D meshes are informative but costly to obtain. We propose VHOI, a two-stage framework that first densifies sparse trajectories into HOI mask sequences, and then fine-tunes a video diffusion model conditioned on these dense masks. We introduce a novel HOI-aware motion representation that uses color encodings to distinguish not only human and object motion, but also body-part-specific dynamics. This design incorporates a human prior into the conditioning signal and strengthens the model's ability to understand and generate realistic HOI dynamics. Experiments demonstrate state-of-the-art results in controllable HOI video generation. VHOI is not limited to interaction-only scenarios and can also generate full human navigation leading up to object interactions in an end-to-end manner. Project page: https://vcai.mpi-inf.mpg.de/projects/vhoi/.

UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories 2025-12-10
Show

Navigating complex urban environments using natural language instructions poses significant challenges for embodied agents, including noisy language instructions, ambiguous spatial references, diverse landmarks, and dynamic street scenes. Current visual navigation methods are typically limited to simulated or off-street environments, and often rely on precise goal formats, such as specific coordinates or images. This limits their effectiveness for autonomous agents like last-mile delivery robots navigating unfamiliar cities. To address these limitations, we introduce UrbanNav, a scalable framework that trains embodied agents to follow free-form language instructions in diverse urban settings. Leveraging web-scale city walking videos, we develop an scalable annotation pipeline that aligns human navigation trajectories with language instructions grounded in real-world landmarks. UrbanNav encompasses over 1,500 hours of navigation data and 3 million instruction-trajectory-landmark triplets, capturing a wide range of urban scenarios. Our model learns robust navigation policies to tackle complex urban scenarios, demonstrating superior spatial reasoning, robustness to noisy instructions, and generalization to unseen urban settings. Experimental results show that UrbanNav significantly outperforms existing methods, highlighting the potential of large-scale web video data to enable language-guided, real-world urban navigation for embodied agents.

9 pag...

9 pages, 5 figures, accepted to AAAI 2026

BridgeDrive: Diffusion Bridge Policy for Closed-Loop Trajectory Planning in Autonomous Driving 2025-12-10
Show

Diffusion-based planners have shown great promise for autonomous driving due to their ability to capture multi-modal driving behaviors. However, guiding these models effectively in reactive, closed-loop environments remains a significant challenge. Simple conditioning often fails to provide sufficient guidance in complex and dynamic driving scenarios. Recent work attempts to use typical expert driving behaviors (i.e., anchors) to guide diffusion models but relies on a truncated schedule, which introduces theoretical inconsistencies and can compromise performance. To address this, we introduce BridgeDrive, a novel anchor-guided diffusion bridge policy for closed-loop trajectory planning. Our approach provides a principled diffusion framework that effectively translates anchors into fine-grained trajectory plans, appropriately responding to varying traffic conditions. Our planner is compatible with efficient ODE solvers, a critical factor for real-time autonomous driving deployment. We achieve state-of-the-art performance on the Bench2Drive benchmark, improving the success rate by 7.72% over prior arts.

19 pa...

19 pages, 7 figures, 9 tables

Microscopic Vehicle Trajectory Datasets from UAV-collected Video for Heterogeneous, Area-Based Urban Traffic 2025-12-10
Show

This paper offers openly available microscopic vehicle trajectory (MVT) datasets collected using unmanned aerial vehicles (UAVs) in heterogeneous, area-based urban traffic conditions. Traditional roadside video collection often fails in dense mixed traffic due to occlusion, limited viewing angles, and irregular vehicle movements. UAV-based recording provides a top-down perspective that reduces these issues and captures rich spatial and temporal dynamics. The datasets described here were extracted using the Data from Sky (DFS) platform and validated against manual counts, space mean speeds, and probe trajectories in earlier work. Each dataset contains time-stamped vehicle positions, speeds, longitudinal and lateral accelerations, and vehicle classifications at a resolution of 30 frames per second. Data were collected at six mid-block locations in the national capital region of India, covering diverse traffic compositions and density levels. Exploratory analyses highlight key behavioural patterns, including lane-keeping preferences, speed distributions, and lateral manoeuvres typical of heterogeneous and area-based traffic settings. These datasets are intended as a resource for the global research community to support simulation modelling, safety assessment, and behavioural studies under area-based traffic conditions. By making these empirical datasets openly available, this work offers researchers a unique opportunity to develop, test, and validate models that more accurately represent complex urban traffic environments.

This ...

This paper presents basic statistics and trends in empirically observed data from highly heterogeneous and area-based traffic while offering the datasets open source for researchers and practitioners

HeLoFusion: An Efficient and Scalable Encoder for Modeling Heterogeneous and Multi-Scale Interactions in Trajectory Prediction 2025-12-10
Show

Multi-agent trajectory prediction in autonomous driving requires a comprehensive understanding of complex social dynamics. Existing methods, however, often struggle to capture the full richness of these dynamics, particularly the co-existence of multi-scale interactions and the diverse behaviors of heterogeneous agents. To address these challenges, this paper introduces HeLoFusion, an efficient and scalable encoder for modeling heterogeneous and multi-scale agent interactions. Instead of relying on global context, HeLoFusion constructs local, multi-scale graphs centered on each agent, allowing it to effectively model both direct pairwise dependencies and complex group-wise interactions (\textit{e.g.}, platooning vehicles or pedestrian crowds). Furthermore, HeLoFusion tackles the critical challenge of agent heterogeneity through an aggregation-decomposition message-passing scheme and type-specific feature networks, enabling it to learn nuanced, type-dependent interaction patterns. This locality-focused approach enables a principled representation of multi-level social context, yielding powerful and expressive agent embeddings. On the challenging Waymo Open Motion Dataset, HeLoFusion achieves state-of-the-art performance, setting new benchmarks for key metrics including Soft mAP and minADE. Our work demonstrates that a locality-grounded architecture, which explicitly models multi-scale and heterogeneous interactions, is a highly effective strategy for advancing motion forecasting.

A data-driven approach to inferring travel trajectory during peak hours in urban rail transit systems 2025-12-10
Show

Refined trajectory inference of urban rail transit is of great significance to the operation organization. In this paper, we develop a fully data-driven approach to inferring individual travel trajectories in urban rail transit systems. It utilizes data from the Automatic Fare Collection (AFC) and Automatic Vehicle Location (AVL) systems to infer key trajectory elements, such as selected train, access/egress time, and transfer time. The approach includes establishing train alternative sets based on spatio-temporal constraints, data-driven adaptive trajectory inference, and trave l trajectory construction. To realize data-driven adaptive trajectory inference, a data-driven parameter estimation method based on KL divergence combined with EM algorithm (KLEM) was proposed. This method eliminates the reliance on external or survey data for parameter fitting, enhancing the robustness and applicability of the model. Furthermore, to overcome the limitations of using synthetic data to validate the result, this paper employs real individual travel trajectory data for verification. The results show that the approach developed in this paper can achieve high-precision passenger trajectory inference, with an accuracy rate of over 90% in urban rail transit travel trajectory inference during peak hours.

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance 2025-12-09
Show

We present Wan-Move, a simple and scalable framework that brings motion control to video generative models. Existing motion-controllable methods typically suffer from coarse control granularity and limited scalability, leaving their outputs insufficient for practical use. We narrow this gap by achieving precise and high-quality motion control. Our core idea is to directly make the original condition features motion-aware for guiding video synthesis. To this end, we first represent object motions with dense point trajectories, allowing fine-grained control over the scene. We then project these trajectories into latent space and propagate the first frame's features along each trajectory, producing an aligned spatiotemporal feature map that tells how each scene element should move. This feature map serves as the updated latent condition, which is naturally integrated into the off-the-shelf image-to-video model, e.g., Wan-I2V-14B, as motion guidance without any architecture change. It removes the need for auxiliary motion encoders and makes fine-tuning base models easily scalable. Through scaled training, Wan-Move generates 5-second, 480p videos whose motion controllability rivals Kling 1.5 Pro's commercial Motion Brush, as indicated by user studies. To support comprehensive evaluation, we further design MoveBench, a rigorously curated benchmark featuring diverse content categories and hybrid-verified annotations. It is distinguished by larger data volume, longer video durations, and high-quality motion annotations. Extensive experiments on MoveBench and the public dataset consistently show Wan-Move's superior motion quality. Code, models, and benchmark data are made publicly available.

Neurl...

NeurlPS 2025. Code and data available at https://github.com/ali-vilab/Wan-Move

Gradient-Informed Monte Carlo Fine-Tuning of Diffusion Models for Low-Thrust Trajectory Design 2025-12-09
Show

Preliminary mission design of low-thrust spacecraft trajectories in the Circular Restricted Three-Body Problem is a global search characterized by a complex objective landscape and numerous local minima. Formulating the problem as sampling from an unnormalized distribution supported on neighborhoods of locally optimal solutions, provides the opportunity to deploy Markov chain Monte Carlo methods and generative machine learning. In this work, we extend our previous self-supervised diffusion model fine-tuning framework to employ gradient-informed Markov chain Monte Carlo. We compare two algorithms - the Metropolis-Adjusted Langevin Algorithm and Hamiltonian Monte Carlo - both initialized from a distribution learned by a diffusion model. Derivatives of an objective function that balances fuel consumption, time of flight and constraint violations are computed analytically using state transition matrices. We show that incorporating the gradient drift term accelerates mixing and improves convergence of the Markov chain for a multi-revolution transfer in the Saturn-Titan system. Among the evaluated methods, MALA provides the best trade-off between performance and computational cost. Starting from samples generated by a baseline diffusion model trained on a related transfer, MALA explicitly targets Pareto-optimal solutions. Compared to a random walk Metropolis algorithm, it increases the feasibility rate from 17.34% to 63.01% and produces a denser, more diverse coverage of the Pareto front. By fine-tuning a diffusion model on the generated samples and associated reward values with reward-weighted likelihood maximization, we learn the global solution structure of the problem and eliminate the need for a tedious separate data generation phase.

Ergodic Trajectory Planning with Dynamic Sensor Footprints 2025-12-09
Show

This paper addresses the problem of trajectory planning for information gathering with a dynamic and resolution-varying sensor footprint. Ergodic planning offers a principled framework that balances exploration (visiting all areas) and exploitation (focusing on high-information regions) by planning trajectories such that the time spent in a region is proportional to the amount of information in that region. Existing ergodic planning often oversimplifies the sensing model by assuming a point sensor or a footprint with constant shape and resolution. In practice, the sensor footprint can drastically change over time as the robot moves, such as aerial robots equipped with downward-facing cameras, whose field of view depends on the orientation and altitude. To overcome this limitation, we propose a new metric that accounts for dynamic sensor footprints, analyze the theoretic local optimality conditions, and propose numerical trajectory optimization algorithms. Experimental results show that the proposed approach can simultaneously optimize both the trajectories and sensor footprints, with up to an order of magnitude better ergodicity than conventional methods. We also deploy our approach in a multi-drone system to ergodically cover an object in 3D space.

12 figures
Multi-Task Bayesian Optimization for Tuning Decentralized Trajectory Generation in Multi-UAV Systems 2025-12-09
Show

This paper investigates the use of Multi-Task Bayesian Optimization for tuning decentralized trajectory generation algorithms in multi-drone systems. We treat each task as a trajectory generation scenario defined by a specific number of drone-to-drone interactions. To model relationships across scenarios, we employ Multi-Task Gaussian Processes, which capture shared structure across tasks and enable efficient information transfer during optimization. We compare two strategies: optimizing the average mission time across all tasks and optimizing each task individually. Through a comprehensive simulation campaign, we show that single-task optimization leads to progressively shorter mission times as swarm size grows, but requires significantly more optimization time than the average-task approach.

Trajectory Densification and Depth from Perspective-based Blur 2025-12-09
Show

In the absence of a mechanical stabilizer, the camera undergoes inevitable rotational dynamics during capturing, which induces perspective-based blur especially under long-exposure scenarios. From an optical standpoint, perspective-based blur is depth-position-dependent: objects residing at distinct spatial locations incur different blur levels even under the same imaging settings. Inspired by this, we propose a novel method that estimate metric depth by examining the blur pattern of a video stream and dense trajectory via joint optical design algorithm. Specifically, we employ off-the-shelf vision encoder and point tracker to extract video information. Then, we estimate depth map via windowed embedding and multi-window aggregation, and densify the sparse trajectory from the optical algorithm using a vision-language model. Evaluations on multiple depth datasets demonstrate that our method attains strong performance over large depth range, while maintaining favorable generalization. Relative to the real trajectory in handheld shooting settings, our optical algorithm achieves superior precision and the dense reconstruction maintains strong accuracy.

Data-Efficient Learning of Anomalous Diffusion with Wavelet Representations: Enabling Direct Learning from Experimental Trajectories 2025-12-09
Show

Machine learning (ML) has become a versatile tool for analyzing anomalous diffusion trajectories, yet most existing pipelines are trained on large collections of simulated data. In contrast, experimental trajectories, such as those from single-particle tracking (SPT), are typically scarce and may differ substantially from the idealized models used for simulation, leading to degradation or even breakdown of performance when ML methods are applied to real data. To address this mismatch, we introduce a wavelet-based representation of anomalous diffusion that enables data-efficient learning directly from experimental recordings. This representation is constructed by applying six complementary wavelet families to each trajectory and combining the resulting wavelet modulus scalograms. We first evaluate the wavelet representation on simulated trajectories from the andi-datasets benchmark, where it clearly outperforms both feature-based and trajectory-based methods with as few as 1000 training trajectories and still retains an advantage on large training sets. We then use this representation to learn directly from experimental SPT trajectories of fluorescent beads diffusing in F-actin networks, where the wavelet representation remains superior to existing alternatives for both diffusion-exponent regression and mesh-size classification. In particular, when predicting the diffusion exponents of experimental trajectories, a model trained on 1200 experimental tracks using the wavelet representation achieves significantly lower errors than state-of-the-art deep learning models trained purely on $10^6$ simulated trajectories. We associate this data efficiency with the emergence of distinct scale fingerprints disentangling underlying diffusion mechanisms in the wavelet spectra.

23 pages, 16 figures
TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning 2025-12-09
Show

Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural networks. Previous work has demonstrated that models can achieve better planning performance when provided with a prior distribution of possible trajectories. However, these approaches often overlook two critical aspects: 1) The appropriate trajectory prior can vary significantly across different driving scenarios. 2) Their trajectory evaluation mechanism lacks policy-driven refinement, remaining constrained by the limitations of one-stage supervised training. To address these issues, we explore improvements in two key areas. For problem 1, we employ MoE to apply different trajectory priors tailored to different scenarios. For problem 2, we utilize Reinforcement Learning to fine-tune the trajectory scoring mechanism. Additionally, we integrate models with different perception backbones to enhance perceptual features. Our integrated model achieved a score of 51.08 on the navsim ICCV benchmark, securing third place.

CDKFormer: Contextual Deviation Knowledge-Based Transformer for Long-Tail Trajectory Prediction 2025-12-09
Show

Predicting the future movements of surrounding vehicles is essential for ensuring the safe operation and efficient navigation of autonomous vehicles (AVs) in urban traffic environments. Existing vehicle trajectory prediction methods primarily focus on improving overall performance, yet they struggle to address long-tail scenarios effectively. This limitation often leads to poor predictions in rare cases, significantly increasing the risk of safety incidents. Taking Argoverse 2 motion forecasting dataset as an example, we first investigate the long-tail characteristics in trajectory samples from two perspectives, individual motion and group interaction, and deriving deviation features to distinguish abnormal from regular scenarios. On this basis, we propose CDKFormer, a Contextual Deviation Knowledge-based Transformer model for long-tail trajectory prediction. CDKFormer integrates an attention-based scene context fusion module to encode spatiotemporal interaction and road topology. An additional deviation feature fusion module is proposed to capture the dynamic deviations in the target vehicle status. We further introduce a dual query-based decoder, supported by a multi-stream decoder block, to sequentially decode heterogeneous scene deviation features and generate multimodal trajectory predictions. Extensive experiments demonstrate that CDKFormer achieves state-of-the-art performance, significantly enhancing prediction accuracy and robustness for long-tailed trajectories compared to existing methods, thus advancing the reliability of AVs in complex real-world environments.

Enabling Autonomous Navigation in a Snake Robot through Visual-Inertial Odometry and Closed-Loop Trajectory Tracking Control 2025-12-09
Show

Snake robots offer exceptional mobility across extreme terrain inaccessible to conventional rovers, yet their highly articulated bodies present fundamental challenges for autonomous navigation in environments lacking external tracking infrastructure. This thesis develops a complete autonomy pipeline for COBRA, an 11 degree-of-freedom modular snake robot designed for planetary exploration. While the robot's biologically inspired serpentine gaits achieve impressive mobility, prior work has relied entirely on open-loop teleoperation. This approach integrates onboard visual-inertial SLAM, reduced-order state estimation, and closed-loop trajectory tracking to enable autonomous waypoint navigation. A depth camera paired with edge computing performs real-time localization during dynamic locomotion, validated against motion-capture ground truth to characterize drift behavior and failure modes unique to snake robot platforms. A reduced-order framework estimates Center-of-Mass pose, driving a closed-loop controller that modulates CPG gait parameters through distance-dependent yaw error blending. Physical experiments validate the complete system, demonstrating accurate multi-waypoint tracking and establishing foundations for autonomous snake robot navigation.

Master's thesis
Zo3T: Zero-Shot 3D-Aware Trajectory-Guided Image-to-Video Generation via Test-Time Training 2025-12-09
Show

Trajectory-Guided image-to-video (I2V) generation aims to synthesize videos that adhere to user-specified motion instructions. Existing methods typically rely on computationally expensive fine-tuning on scarce annotated datasets. Although some zero-shot methods attempt to trajectory control in the latent space, they may yield unrealistic motion by neglecting 3D perspective and creating a misalignment between the manipulated latents and the network's noise predictions. To address these challenges, we introduce Zo3T, a novel zero-shot test-time-training framework for trajectory-guided generation with three core innovations: First, we incorporate a 3D-Aware Kinematic Projection, leveraging inferring scene depth to derive perspective-correct affine transformations for target regions. Second, we introduce Trajectory-Guided Test-Time LoRA, a mechanism that dynamically injects and optimizes ephemeral LoRA adapters into the denoising network alongside the latent state. Driven by a regional feature consistency loss, this co-adaptation effectively enforces motion constraints while allowing the pre-trained model to locally adapt its internal representations to the manipulated latent, thereby ensuring generative fidelity and on-manifold adherence. Finally, we develop Guidance Field Rectification, which refines the denoising evolutionary path by optimizing the conditional guidance field through a one-step lookahead strategy, ensuring efficient generative progression towards the target trajectory. Zo3T significantly enhances 3D realism and motion accuracy in trajectory-controlled I2V generation, demonstrating superior performance over existing training-based and zero-shot approaches.

CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space 2025-12-08
Show

Clinical decision-making in oncology requires predicting dynamic disease evolution, a task current static AI predictors cannot perform. While world models (WMs) offer a paradigm for generative prediction, existing medical applications remain limited. Existing methods often rely on stochastic diffusion models, focusing on visual reconstruction rather than causal, physiological transitions. Furthermore, in medical domain, models like MeWM typically ignore patient-specific temporal and clinical contexts and lack a feedback mechanism to link predictions to treatment decisions. To address these gaps, we introduce CLARITY, a medical world model that forecasts disease evolution directly within a structured latent space. It explicitly integrates time intervals (temporal context) and patient-specific data (clinical context) to model treatment-conditioned progression as a smooth, interpretable trajectory, and thus generate physiologically faithful, individualized treatment plans. Finally, CLARITY introduces a novel prediction-to-decision framework, translating latent rollouts into transparent, actionable recommendations. CLARITY demonstrates state-of-the-art performance in treatment planning. On the MU-Glioma-Post dataset, our approach outperforms recent MeWM by 12%, and significantly surpasses all other medical-specific large language models.

T1-PILOT: Optimized Trajectories for T1 Mapping Acceleration 2025-12-07
Show

Cardiac T1 mapping provides critical quantitative insights into myocardial tissue composition, enabling the assessment of pathologies such as fibrosis, inflammation, and edema. However, the inherently dynamic nature of the heart imposes strict limits on acquisition times, making high-resolution T1 mapping a persistent challenge. Compressed sensing (CS) approaches have reduced scan durations by undersampling k-space and reconstructing images from partial data, and recent studies show that jointly optimizing the undersampling patterns with the reconstruction network can substantially improve performance. Still, most current T1 mapping pipelines rely on static, hand-crafted masks that do not exploit the full acceleration and accuracy potential. In this work, we introduce T1-PILOT: an end-to-end method that explicitly incorporates the T1 signal relaxation model into the sampling-reconstruction framework to guide the learning of non-Cartesian trajectories, crossframe alignment, and T1 decay estimation. Through extensive experiments on the CMRxRecon dataset, T1-PILOT significantly outperforms several baseline strategies (including learned single-mask and fixed radial or golden-angle sampling schemes), achieving higher T1 map fidelity at greater acceleration factors. In particular, we observe consistent gains in PSNR and VIF relative to existing methods, along with marked improvements in delineating finer myocardial structures. Our results highlight that optimizing sampling trajectories in tandem with the physical relaxation model leads to both enhanced quantitative accuracy and reduced acquisition times.

Know your Trajectory -- Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis 2025-12-07
Show

As Reinforcement Learning (RL) agents are increasingly deployed in real-world applications, ensuring their behavior is transparent and trustworthy is paramount. A key component of trust is explainability, yet much of the work in Explainable RL (XRL) focuses on local, single-step decisions. This paper addresses the critical need for explaining an agent's long-term behavior through trajectory-level analysis. We introduce a novel framework that ranks entire trajectories by defining and aggregating a new state-importance metric. This metric combines the classic Q-value difference with a "radical term" that captures the agent's affinity to reach its goal, providing a more nuanced measure of state criticality. We demonstrate that our method successfully identifies optimal trajectories from a heterogeneous collection of agent experiences. Furthermore, by generating counterfactual rollouts from critical states within these trajectories, we show that the agent's chosen path is robustly superior to alternatives, thereby providing a powerful "Why this, and not that?" explanation. Our experiments in standard OpenAI Gym environments validate that our proposed importance metric is more effective at identifying optimal behaviors compared to classic approaches, offering a significant step towards trustworthy autonomous systems.

Accep...

Accepted at 4th Deployable AI Workshop at AAAI 2026

A New Trajectory-Oriented Approach to Enhancing Comprehensive Crowd Navigation Performance 2025-12-07
Show

Crowd navigation has garnered considerable research interest in recent years, especially with the proliferating application of deep reinforcement learning (DRL) techniques. Many studies, however, do not sufficiently analyze the relative priorities among evaluation metrics, which compromises the fair assessment of methods with divergent objectives. Furthermore, trajectory-continuity metrics, specifically those requiring $C^2$ smoothness, are rarely incorporated. Current DRL approaches generally prioritize efficiency and proximal comfort, often neglecting trajectory optimization or addressing it only through simplistic, unvalidated smoothness reward. Nevertheless, effective trajectory optimization is essential to ensure naturalness, enhance comfort, and maximize the energy efficiency of any navigation system. To address these gaps, this paper proposes a unified framework that enables the fair and transparent assessment of navigation methods by examining the prioritization and joint evaluation of multiple optimization objectives. We further propose a novel reward-shaping strategy that explicitly emphasizes trajectory-curvature optimization. The resulting trajectory quality and adaptability are significantly enhanced across multi-scale scenarios. Through extensive 2D and 3D experiments, we demonstrate that the proposed method achieves superior performance compared to state-of-the-art approaches.

8 pages, 6 figures
Trajectory Optimization for Cellular-Connected UAV in Complex Environment with Partial CKM 2025-12-06
Show

Cellular-connected unmanned aerial vehicles (UAVs) are expected to play an increasingly important role in future wireless networks. To facilitate the reliable navigation for cellular-connected UAVs, channel knowledge map (CKM) is considered a promising approach capable of tackling the non-negligible co-channel interference resulting from the high line-of-sight (LoS) probability of air-ground (AG) channels. Nevertheless, due to measurement constraints and the aging of information, CKM is usually incomplete and needs to be regularly updated to capture the dynamic nature of complex environments. In this paper, we propose a novel trajectory design strategy in which UAV navigation and CKM completion are incorporated into a common framework, enabling mutual benefits for both tasks. Specifically, a cellular-connected UAV deployed in an urban environment measures the radio information during its flight and completes the CKM with Kriging interpolation. Based on the method of grid discretization and spherical approximation, a mixed-integer multi-objective optimization problem is formulated. The problem falls into the category of combinatorial mathematics and is essentially equivalent to determining an optimum sequence of grid points to traverse. Through proper mathematical manipulation, the problem is reformulated as variants of two classic models in graph theory, namely the shortest-path problem (SPP) and the traveling salesman problem (TSP). Two navigation strategies based on the two different models are proposed and thoroughly compared based on numerical results to provide implementable methods for engineering practice and reveal the trade-offs between UAV navigation and CKM completion. Simulation results reveal that the proposed navigation strategies can quickly expand the Pareto boundary of the problem and approach the performance of fully-known CKM.

Multi-Modal Zero-Shot Prediction of Color Trajectories in Food Drying 2025-12-05
Show

Food drying is widely used to reduce moisture content, ensure safety, and extend shelf life. Color evolution of food samples is an important indicator of product quality in food drying. Although existing studies have examined color changes under different drying conditions, current approaches primarily rely on low-dimensional color features and cannot fully capture the complex, dynamic color trajectories of food samples. Moreover, existing modeling approaches lack the ability to generalize to unseen process conditions. To address these limitations, we develop a novel multi-modal color-trajectory prediction method that integrates high-dimensional temporal color information with drying process parameters to enable accurate and data-efficient color trajectory prediction. Under unseen drying conditions, the model attains RMSEs of 2.12 for cookie drying and 1.29 for apple drying, reducing errors by over 90% compared with baseline models. These experimental results demonstrate the model's superior accuracy, robustness, and broad applicability.

iMotion-LLM: Instruction-Conditioned Trajectory Generation 2025-12-05
Show

We introduce iMotion-LLM, a large language model (LLM) integrated with trajectory prediction modules for interactive motion generation. Unlike conventional approaches, it generates feasible, safety-aligned trajectories based on textual instructions, enabling adaptable and context-aware driving behavior. It combines an encoder-decoder multimodal trajectory prediction model with a pre-trained LLM fine-tuned using LoRA, projecting scene features into the LLM input space and mapping special tokens to a trajectory decoder for text-based interaction and interpretable driving. To support this framework, we introduce two datasets: 1) InstructWaymo, an extension of the Waymo Open Motion Dataset with direction-based motion instructions, and 2) Open-Vocabulary InstructNuPlan, which features safety-aligned instruction-caption pairs and corresponding safe trajectory scenarios. Our experiments validate that instruction conditioning enables trajectory generation that follows the intended condition. iMotion-LLM demonstrates strong contextual comprehension, achieving 84% average accuracy in direction feasibility detection and 96% average accuracy in safety evaluation of open-vocabulary instructions. This work lays the foundation for text-guided motion generation in autonomous driving, supporting simulated data generation, model interpretability, and robust safety alignment testing for trajectory generation models. Our code, pre-trained model, and datasets are available at: https://vision-cair.github.io/iMotion-LLM/.

Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning 2025-12-05
Show

This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions. Leveraging Bayesian Active Inference, the approach combines expert-generated demonstrations with probabilistic generative modeling to encode high-level symbolic planning, low-level motion policies, and wireless signal feedback. During deployment, the UAV performs online inference to anticipate interference, localize jammers, and adapt its trajectory accordingly, without prior knowledge of jammer locations. Simulation results demonstrate that the proposed method achieves near-expert performance, significantly reducing communication interference and mission cost compared to model-free reinforcement learning baselines, while maintaining robust generalization in dynamic environments.

This ...

This paper has been accepted for the 2026 IEEE Consumer Communications & Networking Conference (IEEE CCNC 2026)

Scenario-aware Uncertainty Quantification for Trajectory Prediction with Statistical Guarantees 2025-12-05
Show

Reliable uncertainty quantification in trajectory prediction is crucial for safety-critical autonomous driving systems, yet existing deep learning predictors lack uncertainty-aware frameworks adaptable to heterogeneous real-world scenarios. To bridge this gap, we propose a novel scenario-aware uncertainty quantification framework to provide the predicted trajectories with prediction intervals and reliability assessment. To begin with, predicted trajectories from the trained predictor and their ground truth are projected onto the map-derived reference routes within the Frenet coordinate system. We then employ CopulaCPTS as the conformal calibration method to generate temporal prediction intervals for distinct scenarios as the uncertainty measure. Building upon this, within the proposed trajectory reliability discriminator (TRD), mean error and calibrated confidence intervals are synergistically analyzed to establish reliability models for different scenarios. Subsequently, the risk-aware discriminator leverages a joint risk model that integrates longitudinal and lateral prediction intervals within the Frenet coordinate to identify critical points. This enables segmentation of trajectories into reliable and unreliable segments, holding the advantage of informing downstream planning modules with actionable reliability results. We evaluated our framework using the real-world nuPlan dataset, demonstrating its effectiveness in scenario-aware uncertainty quantification and reliability assessment across diverse driving contexts.

Momentum-constrained Hybrid Heuristic Trajectory Optimization Framework with Residual-enhanced DRL for Visually Impaired Scenarios 2025-12-05
Show

This paper proposes a momentum-constrained hybrid heuristic trajectory optimization framework (MHHTOF) tailored for assistive navigation in visually impaired scenarios, integrating trajectory sampling generation, optimization and evaluation with residual-enhanced deep reinforcement learning (DRL). In the first stage, heuristic trajectory sampling cluster (HTSC) is generated in the Frenet coordinate system using third-order interpolation with fifth-order polynomials and momentum-constrained trajectory optimization (MTO) constraints to ensure smoothness and feasibility. After first stage cost evaluation, the second stage leverages a residual-enhanced actor-critic network with LSTM-based temporal feature modeling to adaptively refine trajectory selection in the Cartesian coordinate system. A dual-stage cost modeling mechanism (DCMM) with weight transfer aligns semantic priorities across stages, supporting human-centered optimization. Experimental results demonstrate that the proposed LSTM-ResB-PPO achieves significantly faster convergence, attaining stable policy performance in approximately half the training iterations required by the PPO baseline, while simultaneously enhancing both reward outcomes and training stability. Compared to baseline method, the selected model reduces average cost and cost variance by 30.3% and 53.3%, and lowers ego and obstacle risks by over 77%. These findings validate the framework's effectiveness in enhancing robustness, safety, and real-time feasibility in complex assistive planning tasks.

Upon ...

Upon further internal evaluation, we found that the current version does not adequately represent the clarity and completeness that we intend for this work. To avoid possible misunderstanding caused by this preliminary form, we request withdrawal. A refined version will be prepared privately before any further dissemination

Hierarchical Task Offloading and Trajectory Optimization in Low-Altitude Intelligent Networks Via Auction and Diffusion-based MARL 2025-12-05
Show

The low-altitude intelligent networks (LAINs) emerge as a promising architecture for delivering low-latency and energy-efficient edge intelligence in dynamic and infrastructure-limited environments. By integrating unmanned aerial vehicles (UAVs), aerial base stations, and terrestrial base stations, LAINs can support mission-critical applications such as disaster response, environmental monitoring, and real-time sensing. However, these systems face key challenges, including energy-constrained UAVs, stochastic task arrivals, and heterogeneous computing resources. To address these issues, we propose an integrated air-ground collaborative network and formulate a time-dependent integer nonlinear programming problem that jointly optimizes UAV trajectory planning and task offloading decisions. The problem is challenging to solve due to temporal coupling among decision variables. Therefore, we design a hierarchical learning framework with two timescales. At the large timescale, a Vickrey-Clarke-Groves auction mechanism enables the energy-aware and incentive-compatible trajectory assignment. At the small timescale, we propose the diffusion-heterogeneous-agent proximal policy optimization, a generative multi-agent reinforcement learning algorithm that embeds latent diffusion models into actor networks. Each UAV samples actions from a Gaussian prior and refines them via observation-conditioned denoising, enhancing adaptability and policy diversity. Extensive simulations show that our framework outperforms baselines in energy efficiency, task success rate, and convergence performance.

MIGHTY: Hermite Spline-based Efficient Trajectory Planning 2025-12-04
Show

Hard-constraint trajectory planners often rely on commercial solvers and demand substantial computational resources. Existing soft-constraint methods achieve faster computation, but either (1) decouple spatial and temporal optimization or (2) restrict the search space. To overcome these limitations, we introduce MIGHTY, a Hermite spline-based planner that performs spatiotemporal optimization while fully leveraging the continuous search space of a spline. In simulation, MIGHTY achieves a 9.3% reduction in computation time and a 13.1% reduction in travel time over state-of-the-art baselines, with a 100% success rate. In hardware, MIGHTY completes multiple high-speed flights up to 6.7 m/s in a cluttered static environment and long-duration flights with dynamically added obstacles.

9 pages, 10 figures
On Disturbance-Aware Minimum-Time Trajectory Planning: Evidence from Tests on a Dynamic Driving Simulator 2025-12-04
Show

This work investigates how disturbance-aware, robustness-embedded reference trajectories translate into driving performance when executed by professional drivers in a dynamic simulator. Three planned reference trajectories are compared against a free-driving baseline (NOREF) to assess trade-offs between lap time (LT) and steering effort (SE): NOM, the nominal time-optimal trajectory; TLC, a track-limit-robust trajectory obtained by tightening margins to the track edges; and FLC, a friction-limit-robust trajectory obtained by tightening against axle and tire saturation. All trajectories share the same minimum lap-time objective with a small steering-smoothness regularizer and are evaluated by two professional drivers using a high-performance car on a virtual track. The trajectories derive from a disturbance-aware minimum-lap-time framework recently proposed by the authors, where worst-case disturbance growth is propagated over a finite horizon and used to tighten tire-friction and track-limit constraints, preserving performance while providing probabilistic safety margins. LT and SE are used as performance indicators, while RMS lateral deviation, speed error, and drift angle characterize driving style. Results show a Pareto-like LT-SE trade-off: NOM yields the shortest LT but highest SE; TLC minimizes SE at the cost of longer LT; FLC lies near the efficient frontier, substantially reducing SE relative to NOM with only a small LT increase. Removing trajectory guidance (NOREF) increases both LT and SE, confirming that reference trajectories improve pace and control efficiency. Overall, the findings highlight reference-based and disturbance-aware planning, especially FLC, as effective tools for training and for achieving fast yet stable trajectories.

18 pa...

18 pages, 11 figures, 5 tables

TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents 2025-12-04
Show

Building Graphical User Interface (GUI) agents is a promising research direction, which simulates human interaction with computers or mobile phones to perform diverse GUI tasks. However, a major challenge in developing generalized GUI agents is the lack of sufficient trajectory data across various operating systems and applications, mainly due to the high cost of manual annotations. In this paper, we propose the TongUI framework that builds generalized GUI agents by learning from rich multimodal web tutorials. Concretely, we crawl and process online GUI tutorials (such as videos and articles) into GUI agent trajectory data, through which we produce the GUI-Net dataset containing 143K trajectory data across five operating systems and more than 200 applications. We develop the TongUI agent by fine-tuning Qwen2.5-VL-3B/7B models on GUI-Net, which show remarkable performance improvements on commonly used grounding and navigation benchmarks, outperforming baseline agents about 10% on multiple benchmarks, showing the effectiveness of the GUI-Net dataset and underscoring the significance of our TongUI framework. We will fully open-source the code, the GUI-Net dataset, and the trained models soon.

AAAI 2026
Blind Radio Mapping via Spatially Regularized Bayesian Trajectory Inference 2025-12-04
Show

Radio maps enable intelligent wireless applications by capturing the spatial distribution of channel characteristics. However, conventional construction methods demand extensive location-labeled data, which are costly and impractical in many real-world scenarios. This paper presents a blind radio map construction framework that infers user trajectories from indoor multiple-input multiple-output (MIMO)-Orthogonal Frequency-Division Multiplexing (OFDM) channel measurements without relying on location labels. It first proves that channel state information (CSI) under non-line-of-sight (NLOS) exhibits spatial continuity under a quasi-specular environmental model, allowing the derivation of a CSI-distance metric that is proportional to the corresponding physical distance. For rectilinear trajectories in Poisson-distributed access point (AP) deployments, it is shown that the Cramer-Rao Lower Bound (CRLB) of localization error vanishes asymptotically, even under poor angular resolution. Building on these theoretical results, a spatially regularized Bayesian inference framework is developed that jointly estimates channel features, distinguishes line-of-sight (LOS)/NLOS conditions and recovers user trajectories. Experiments on a ray-tracing dataset demonstrate an average localization error of 0.68 m and a beam map reconstruction error of 3.3%, validating the effectiveness of the proposed blind mapping method.

Differential Filtering in a Common Basic Cycle: Multi-Major Trajectories and Structural Bottlenecks in Exact Sciences and Engineering Degrees 2025-12-03
Show

Universities often present the Common Basic Cycle (CBC) as a neutral levelling stage shared by several degree programmes. Using twenty years of longitudinal administrative records from a Faculty of Engineering and Exact Sciences, this study tests whether the CBC actually operates as a uniform gateway or as a differential filter across majors. We reconstruct student trajectories for 24,017 entrants, identifying CBC subjects (year level <= 1), destination major, time to exit from the CBC, and final outcome (progression to upper cycle, drop-out, or right-censoring). The analysis combines transition matrices, Kaplan-Meier survival curves, stratified Cox models and subject-level logistic models of drop-out after failure, extended with multi-major enrolment data and a pre/post 2006 curriculum reform comparison. Results show that the CBC functions as a strongly differential filter. Post-reform, the probability of progressing to the upper cycle in the same major ranges from about 0.20 to 0.70 across programmes, while overall drop-out in the CBC exceeds 60%. Early Mathematics modules (introductory calculus and algebra) emerge as structural bottlenecks, combining low pass rates with a two- to three-fold increase in the hazard of leaving the system after failure, with markedly different severity by destination major. Multi-major enrolment, often treated administratively as indecision, is instead associated with lower drop-out, suggesting an adaptive exploration of feasible trajectories. The findings portray the CBC not as a neutral academic foyer, but as a structured sorting device whose impact depends sharply on the targeted degree and on the opportunity to explore alternative majors.

24 pa...

24 pages, 5 figures , 3 tables

Sliding Mode Control and Subspace Stabilization Methodology for the Orbital Stabilization of Periodic Trajectories 2025-12-03
Show

This paper presents a combined sliding-mode control and subspace stabilization methodology for orbital stabilization of periodic trajectories in underactuated mechanical systems with one degree of underactuation. The approach starts with partial feedback linearization and stabilization. Then, transverse linearization along the reference orbit is computed, resulting in a periodic linear time-varying system with a stable subspace. Sliding-mode control drives trajectories toward this subspace. The proposed design avoids solving computationally intensive periodic LQR problems and improves robustness to matched disturbances. The methodology is validated through experiments on the Butterfly robot.

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training 2025-12-03
Show

Reinforcement learning (RL) is a critical component of large language model (LLM) post-training. However, on-policy algorithms used for post-training are not naturally robust to a diversified content of experience replay buffers, which asynchronous off-policy actors can efficiently populate in parallel to training. We propose efficiently learning on such off-policy data via Trajectory Balance with Asynchrony (TBA), an approach to asynchronous RL for LLMs that leverages the principled off-policy TB objective. On math, preference-tuning, and automated red-teaming tasks, we post-train models ranging from Pythia 410M to Qwen 2.5 7B, finding TBA offers speed and performance boosts over strong baselines like Online DPO and Dr. GRPO. Beyond TBA's performance benefits (high accuracy even as asynchrony grows) and speedups ($4\times$ or more), we show its reward- and recency-prioritizing sampling enable further gains as data generation is scaled. Our code is available at https://github.com/bbartoldson/TBA.

NeurI...

NeurIPS 2025; 27 pages

Origin-Conditional Trajectory Encoding: Measuring Urban Configurational Asymmetries through Neural Decomposition 2025-12-03
Show

Urban analytics increasingly relies on AI-driven trajectory analysis, yet current approaches suffer from methodological fragmentation: trajectory learning captures movement patterns but ignores spatial context, while spatial embedding methods encode street networks but miss temporal dynamics. Three gaps persist: (1) lack of joint training that integrates spatial and temporal representations, (2) origin-agnostic treatment that ignores directional asymmetries in navigation ($A \to B \ne B \to A$), and (3) over-reliance on auxiliary data (POIs, imagery) rather than fundamental geometric properties of urban space. We introduce a conditional trajectory encoder that jointly learns spatial and movement representations while preserving origin-dependent asymmetries using geometric features. This framework decomposes urban navigation into shared cognitive patterns and origin-specific spatial narratives, enabling quantitative measurement of cognitive asymmetries across starting locations. Our bidirectional LSTM processes visibility ratio and curvature features conditioned on learnable origin embeddings, decomposing representations into shared urban patterns and origin-specific signatures through contrastive learning. Results from six synthetic cities and real-world validation on Beijing's Xicheng District demonstrate that urban morphology creates systematic cognitive inequalities. This provides urban planners quantitative tools for assessing experiential equity, offers architects insights into layout decisions' cognitive impacts, and enables origin-aware analytics for navigation systems.

ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation 2025-12-03
Show

We propose ReCamDriving, a purely vision-based, camera-controlled novel-trajectory video generation framework. While repair-based methods fail to restore complex artifacts and LiDAR-based approaches rely on sparse and incomplete cues, ReCamDriving leverages dense and scene-complete 3DGS renderings for explicit geometric guidance, achieving precise camera-controllable generation. To mitigate overfitting to restoration behaviors when conditioned on 3DGS renderings, ReCamDriving adopts a two-stage training paradigm: the first stage uses camera poses for coarse control, while the second stage incorporates 3DGS renderings for fine-grained viewpoint and geometric guidance. Furthermore, we present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns, enabling scalable multi-trajectory supervision from monocular videos. Based on this strategy, we construct the ParaDrive dataset, containing over 110K parallel-trajectory video pairs. Extensive experiments demonstrate that ReCamDriving achieves state-of-the-art camera controllability and structural consistency.

Proje...

Project page: https://recamdriving.github.io/

Federated Learning and Trajectory Compression for Enhanced AIS Coverage 2025-12-03
Show

This paper presents the VesselEdge system, which leverages federated learning and bandwidth-constrained trajectory compression to enhance maritime situational awareness by extending AIS coverage. VesselEdge transforms vessels into mobile sensors, enabling real-time anomaly detection and efficient data transmission over low-bandwidth connections. The system integrates the M3fed model for federated learning and the BWC-DR-A algorithm for trajectory compression, prioritizing anomalous data. Preliminary results demonstrate the effectiveness of VesselEdge in improving AIS coverage and situational awareness using historical data.

Experimental Characterization of Fingertip Trajectory following for a 3-DoF Series-Parallel Hybrid Robotic Finger 2025-12-02
Show

Task-space control of robotic fingers is a critical enabler of dexterous manipulation, as manipulation objectives are most naturally specified in terms of fingertip motions and applied forces rather than individual joint angles. While task-space planning and control have been extensively studied for larger, arm-scale manipulators, demonstrations of precise task-space trajectory tracking in compact, multi-DoF robotic fingers remain scarce. In this paper, we present the physical prototyping and experimental characterization of a three-degree-of-freedom, linkage-driven, series-parallel robotic finger with analytic forward kinematics and a closed-form Jacobian. A resolved motion rate control (RMRC) scheme is implemented to achieve closed-loop task-space trajectory tracking. We experimentally evaluate the fingertip tracking performance across a variety of trajectories, including straight lines, circles, and more complex curves, and report millimeter-level accuracy. To the best of our knowledge, this work provides one of the first systematic experimental demonstrations of precise task-space trajectory tracking in a linkage-driven robotic finger, thereby establishing a benchmark for future designs aimed at dexterous in-hand manipulation.

TGDD: Trajectory Guided Dataset Distillation with Balanced Distribution 2025-12-02
Show

Dataset distillation compresses large datasets into compact synthetic ones to reduce storage and computational costs. Among various approaches, distribution matching (DM)-based methods have attracted attention for their high efficiency. However, they often overlook the evolution of feature representations during training, which limits the expressiveness of synthetic data and weakens downstream performance. To address this issue, we propose Trajectory Guided Dataset Distillation (TGDD), which reformulates distribution matching as a dynamic alignment process along the model's training trajectory. At each training stage, TGDD captures evolving semantics by aligning the feature distribution between the synthetic and original dataset. Meanwhile, it introduces a distribution constraint regularization to reduce class overlap. This design helps synthetic data preserve both semantic diversity and representativeness, improving performance in downstream tasks. Without additional optimization overhead, TGDD achieves a favorable balance between performance and efficiency. Experiments on ten datasets demonstrate that TGDD achieves state-of-the-art performance, notably a 5.0% accuracy gain on high-resolution benchmarks.

Accep...

Accepted in AAAI 2026

Multi-Domain Enhanced Map-Free Trajectory Prediction with Selective Attention 2025-12-02
Show

Trajectory prediction is crucial for the reliability and safety of autonomous driving systems, yet it remains a challenging task in complex interactive scenarios. Existing methods often struggle to efficiently extract valuable scene information from redundant data, thereby reducing computational efficiency and prediction accuracy, especially when dealing with intricate agent interactions. To address these challenges, we propose a novel map-free trajectory prediction algorithm that achieves trajectory prediction across the temporal, spatial, and frequency domains. Specifically, in temporal information processing, We utilize a Mixture of Experts (MoE) mechanism to adaptively select critical frequency components. Concurrently, we extract these components and integrate multi-scale temporal features. Subsequently, a selective attention module is proposed to filter out redundant information in both temporal sequences and spatial interactions. Finally, we design a multimodal decoder. Under the supervision of patch-level and point-level losses, we obtain reasonable trajectory results. Experiments on Nuscences datasets demonstrate the superiority of our algorithm, validating its effectiveness in handling complex interactive scenarios.

Physical ID-Transfer Attacks against Multi-Object Tracking via Adversarial Trajectory 2025-12-01
Show

Multi-Object Tracking (MOT) is a critical task in computer vision, with applications ranging from surveillance systems to autonomous driving. However, threats to MOT algorithms have yet been widely studied. In particular, incorrect association between the tracked objects and their assigned IDs can lead to severe consequences, such as wrong trajectory predictions. Previous attacks against MOT either focused on hijacking the trackers of individual objects, or manipulating the tracker IDs in MOT by attacking the integrated object detection (OD) module in the digital domain, which are model-specific, non-robust, and only able to affect specific samples in offline datasets. In this paper, we present AdvTraj, the first online and physical ID-manipulation attack against tracking-by-detection MOT, in which an attacker uses adversarial trajectories to transfer its ID to a targeted object to confuse the tracking system, without attacking OD. Our simulation results in CARLA show that AdvTraj can fool ID assignments with 100% success rate in various scenarios for white-box attacks against SORT, which also have high attack transferability (up to 93% attack success rate) against state-of-the-art (SOTA) MOT algorithms due to their common design principles. We characterize the patterns of trajectories generated by AdvTraj and propose two universal adversarial maneuvers that can be performed by a human walker/driver in daily scenarios. Our work reveals under-explored weaknesses in the object association phase of SOTA MOT systems, and provides insights into enhancing the robustness of such systems.

Accep...

Accepted to Annual Computer Security Applications Conference (ACSAC) 2024

PRIMT: Preference-based Reinforcement Learning with Multimodal Feedback and Trajectory Synthesis from Foundation Models 2025-12-01
Show

Preference-based reinforcement learning (PbRL) has emerged as a promising paradigm for teaching robots complex behaviors without reward engineering. However, its effectiveness is often limited by two critical challenges: the reliance on extensive human input and the inherent difficulties in resolving query ambiguity and credit assignment during reward learning. In this paper, we introduce PRIMT, a PbRL framework designed to overcome these challenges by leveraging foundation models (FMs) for multimodal synthetic feedback and trajectory synthesis. Unlike prior approaches that rely on single-modality FM evaluations, PRIMT employs a hierarchical neuro-symbolic fusion strategy, integrating the complementary strengths of large language models and vision-language models in evaluating robot behaviors for more reliable and comprehensive feedback. PRIMT also incorporates foresight trajectory generation, which reduces early-stage query ambiguity by warm-starting the trajectory buffer with bootstrapped samples, and hindsight trajectory augmentation, which enables counterfactual reasoning with a causal auxiliary loss to improve credit assignment. We evaluate PRIMT on 2 locomotion and 6 manipulation tasks on various benchmarks, demonstrating superior performance over FM-based and scripted baselines.

A Unified Framework for Probabilistic Dynamic-, Trajectory- and Vision-based Virtual Fixtures 2025-12-01
Show

Probabilistic Virtual Fixtures (VFs) enable the adaptive selection of the most suitable haptic feedback for each phase of a task, based on learned or perceived uncertainty. While keeping the human in the loop remains essential, for instance, to ensure high precision, partial automation of certain task phases is critical for productivity. We present a unified framework for probabilistic VFs that seamlessly switches between manual fixtures, semi-automated fixtures (with the human handling precise tasks), and full autonomy. We introduce a novel probabilistic Dynamical System-based VF for coarse guidance, enabling the robot to autonomously complete certain task phases while keeping the human operator in the loop. For tasks requiring precise guidance, we extend probabilistic position-based trajectory fixtures with automation allowing for seamless human interaction as well as geometry-awareness and optimal impedance gains. For manual tasks requiring very precise guidance, we also extend visual servoing fixtures with the same geometry-awareness and impedance behavior. We validate our approach experimentally on different robots, showcasing multiple operation modes and the ease of programming fixtures.

for t...

for the supplementary video, see https://www.youtube.com/watch?v=vUXzcpMbMnY

Capturing Context-Aware Route Choice Semantics for Trajectory Representation Learning 2025-12-01
Show

Trajectory representation learning (TRL) aims to encode raw trajectory data into low-dimensional embeddings for downstream tasks such as travel time estimation, mobility prediction, and trajectory similarity analysis. From a behavioral perspective, a trajectory reflects a sequence of route choices within an urban environment. However, most existing TRL methods ignore this underlying decision-making process and instead treat trajectories as static, passive spatiotemporal sequences, thereby limiting the semantic richness of the learned representations. To bridge this gap, we propose CORE, a TRL framework that integrates context-aware route choice semantics into trajectory embeddings. CORE first incorporates a multi-granular Environment Perception Module, which leverages large language models (LLMs) to distill environmental semantics from point of interest (POI) distributions, thereby constructing a context-enriched road network. Building upon this backbone, CORE employs a Route Choice Encoder with a mixture-of-experts (MoE) architecture, which captures route choice patterns by jointly leveraging the context-enriched road network and navigational factors. Finally, a Transformer encoder aggregates the route-choice-aware representations into a global trajectory embedding. Extensive experiments on 4 real-world datasets across 6 downstream tasks demonstrate that CORE consistently outperforms 12 state-of-the-art TRL methods, achieving an average improvement of 9.79% over the best-performing baseline. Our code is available at https://github.com/caoji2001/CORE.

Graph Neural Networks

Title Date Abstract Comment
Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation 2025-12-19
Show

Effectively capturing long-range interactions remains a fundamental yet unresolved challenge in graph neural network (GNN) research, critical for applications across diverse fields of science. To systematically address this, we introduce ECHO (Evaluating Communication over long HOps), a novel benchmark specifically designed to rigorously assess the capabilities of GNNs in handling very long-range graph propagation. ECHO includes three synthetic graph tasks, namely single-source shortest paths, node eccentricity, and graph diameter, each constructed over diverse and structurally challenging topologies intentionally designed to introduce significant information bottlenecks. ECHO also includes two real-world datasets, ECHO-Charge and ECHO-Energy, which define chemically grounded benchmarks for predicting atomic partial charges and molecular total energies, respectively, with reference computations obtained at the density functional theory (DFT) level. Both tasks inherently depend on capturing complex long-range molecular interactions. Our extensive benchmarking of popular GNN architectures reveals clear performance gaps, emphasizing the difficulty of true long-range propagation and highlighting design choices capable of overcoming inherent limitations. ECHO thereby sets a new standard for evaluating long-range information propagation, also providing a compelling example for its need in AI for science.

STAGNet: A Spatio-Temporal Graph and LSTM Framework for Accident Anticipation 2025-12-19
Show

Accident prediction and timely preventive actions improve road safety by reducing the risk of injury to road users and minimizing property damage. Hence, they are critical components of advanced driver assistance systems (ADAS) and autonomous vehicles. While many existing systems depend on multiple sensors such as LiDAR, radar, and GPS, relying solely on dash-cam videos presents a more challenging, yet more cost-effective and easily deployable solution. In this work, we incorporate improved spatio-temporal features and aggregate them through a recurrent network to enhance state-of-the-art graph neural networks for predicting accidents from dash-cam videos. Experiments using three publicly available datasets (DAD, DoTA and DADA) show that our proposed STAGNet model achieves higher average precision and mean time-to-accident scores than previous methods, both when cross-validated on a given dataset and when trained and tested on different datasets.

Publi...

Published in IEEE Access

Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting 2025-12-19
Show

The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.

Logical Characterizations of GNNs with Mean Aggregation 2025-12-19
Show

We study the expressive power of graph neural networks (GNNs) with mean as the aggregation function, with the following results. In the non-uniform setting, such GNNs have exactly the same expressive power as ratio modal logic, which has modal operators expressing that at least a certain ratio of the successors of a vertex satisfies a specified property. In the uniform setting, the expressive power relative to MSO is exactly that of modal logic, and thus identical to the (absolute) expressive power of GNNs with max aggregation. The proof, however, depends on constructions that are not satisfactory from a practical perspective. This leads us to making the natural assumptions that combination functions are continuous and classification functions are thresholds. The resulting class of GNNs with mean aggregation turns out to be much less expressive: relative to MSO and in the uniform setting, it has the same expressive power as alternation-free modal logic. This is in contrast to the expressive power of GNNs with max and sum aggregation, which is not affected by these assumptions.

26 pa...

26 pages, extended version of paper to appear in AAAI 2026

A lightweight Spatial-Temporal Graph Neural Network for Long-term Time Series Forecasting 2025-12-19
Show

We propose Lite-STGNN, a lightweight spatial-temporal graph neural network for long-term multivariate forecasting that integrates decomposition-based temporal modeling with learnable sparse graph structure. The temporal module applies trend-seasonal decomposition, while the spatial module performs message passing with low-rank Top-$K$ adjacency learning and conservative horizon-wise gating, enabling spatial corrections that enhance a strong linear baseline. Lite-STGNN achieves state-of-the-art accuracy on four benchmark datasets for horizons up to 720 steps, while being parameter-efficient and substantially faster to train than transformer-based methods. Ablation studies show that the spatial module yields 4.6% improvement over the temporal baseline, Top-$K$ enhances locality by 3.3%, and learned adjacency matrices reveal domain-specific interaction dynamics. Lite-STGNN thus offers a compact, interpretable, and efficient framework for long-term multivariate time series forecasting.

9 pag...

9 pages, 5 figures, 2 tables. Accepted for presentation at the 18th International Conference on Agents and Artificial Intelligence (ICAART 2026), Marbella, Spain

Adaptive Graph Pruning with Sudden-Events Evaluation for Traffic Prediction using Online Semi-Decentralized ST-GNNs 2025-12-19
Show

Spatio-Temporal Graph Neural Networks (ST-GNNs) are well-suited for processing high-frequency data streams from geographically distributed sensors in smart mobility systems. However, their deployment at the edge across distributed compute nodes (cloudlets) createssubstantial communication overhead due to repeated transmission of overlapping node features between neighbouring cloudlets. To address this, we propose an adaptive pruning algorithm that dynamically filters redundant neighbour features while preserving the most informative spatial context for prediction. The algorithm adjusts pruning rates based on recent model performance, allowing each cloudlet to focus on regions experiencing traffic changes without compromising accuracy. Additionally, we introduce the Sudden Event Prediction Accuracy (SEPA), a novel event-focused metric designed to measure responsiveness to traffic slowdowns and recoveries, which are often missed by standard error metrics. We evaluate our approach in an online semi-decentralized setting with traditional FL, server-free FL, and Gossip Learning on two large-scale traffic datasets, PeMS-BAY and PeMSD7-M, across short-, mid-, and long-term prediction horizons. Experiments show that, in contrast to standard metrics, SEPA exposes the true value of spatial connectivity in predicting dynamic and irregular traffic. Our adaptive pruning algorithm maintains prediction accuracy while significantly lowering communication cost in all online semi-decentralized settings, demonstrating that communication can be reduced without compromising responsiveness to critical traffic events.

19 pa...

19 pages, 6 figures, 5 tables, journal

From Priors to Predictions: Explaining and Visualizing Human Reasoning in a Graph Neural Network Framework 2025-12-19
Show

Humans excel at solving novel reasoning problems from minimal exposure, guided by inductive biases, assumptions about which entities and relationships matter. Yet the computational form of these biases and their neural implementation remain poorly understood. We introduce a framework that combines Graph Theory and Graph Neural Networks (GNNs) to formalize inductive biases as explicit, manipulable priors over structure and abstraction. Using a human behavioral dataset adapted from the Abstraction and Reasoning Corpus (ARC), we show that differences in graph-based priors can explain individual differences in human solutions. Our method includes an optimization pipeline that searches over graph configurations, varying edge connectivity and node abstraction, and a visualization approach that identifies the computational graph, the subset of nodes and edges most critical to a model's prediction. Systematic ablation reveals how generalization depends on specific prior structures and internal processing, exposing why human like errors emerge from incorrect or incomplete priors. This work provides a principled, interpretable framework for modeling the representational assumptions and computational dynamics underlying generalization, offering new insights into human reasoning and a foundation for more human aligned AI systems.

44 pa...

44 pages, 7 figures, 3 suppl figures

Beyond Force Metrics: Pre-Training MLFFs for Stable MD Simulations 2025-12-19
Show

Machine-learning force fields (MLFFs) have emerged as a promising solution for speeding up ab initio molecular dynamics (MD) simulations, where accurate force predictions are critical but often computationally expensive. In this work, we employ GemNet-T, a graph neural network model, as an MLFF and investigate two training strategies: (1) direct training on MD17 (10K samples) without pre-training, and (2) pre-training on the large-scale OC20 dataset followed by fine-tuning on MD17 (10K). While both approaches achieve low force mean absolute errors (MAEs), reaching 5 meV/A per atom, we find that lower force errors do not necessarily guarantee stable MD simulations. Notably, the pre-trained GemNet-T model yields significantly improved simulation stability, sustaining trajectories up to three times longer than the model trained from scratch. By analyzing local properties of the learned force fields, we find that pre-training produces more structured latent representations, smoother force responses to local geometric changes, and more consistent force differences between nearby configurations, all of which contribute to more stable and reliable MD simulations. These findings underscore the value of pre-training on large, diverse datasets to capture complex molecular interactions and highlight that force MAE alone is not always a sufficient metric of MD simulation stability.

DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations 2025-12-18
Show

Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance -- invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes -- from simple ellipsoids to complex morphologies -- using only minimal spatial cues.

Dirichlet Meets Horvitz and Thompson: Estimating Homophily in Large Networks via Sampling 2025-12-18
Show

Assessing homophily in large-scale networks is central to understanding structural regularities in graphs, and thus inform the choice of models (such as graph neural networks) adopted to learn from network data. Evaluation of smoothness metrics requires access to the entire network topology and node features, which may be impractical in several large-scale, dynamic, resource-limited, or privacy-constrained settings. In this work, we propose a sampling-based framework to estimate homophily via the Dirichlet energy (Laplacian-based total variation) of graph signals, leveraging the Horvitz-Thompson (HT) estimator for unbiased inference from partial graph observations. The Dirichlet energy is a so-termed total (of squared nodal feature deviations) over graph edges; hence, estimable under general network sampling designs for which edge-inclusion probabilities can be analytically derived and used as weights in the proposed HT estimator. We establish that the Dirichlet energy can be consistently estimated from sampled graphs, and empirically study other heterophily measures as well. Experiments on several heterophilic benchmark datasets demonstrate the effectiveness of the proposed HT estimators in reliably capturing homophilic structure (or lack thereof) from sampled network measurements.

Microsoft Academic Graph Information Retrieval for Research Recommendation and Assistance 2025-12-18
Show

In today's information-driven world, access to scientific publications has become increasingly easy. At the same time, filtering through the massive volume of available research has become more challenging than ever. Graph Neural Networks (GNNs) and graph attention mechanisms have shown strong effectiveness in searching large-scale information databases, particularly when combined with modern large language models. In this paper, we propose an Attention-Based Subgraph Retriever, a GNN-as-retriever model that applies attention-based pruning to extract a refined subgraph, which is then passed to a large language model for advanced knowledge reasoning.

5 pages, 3 figures
Riemannian Stochastic Interpolants for Amorphous Particle Systems 2025-12-18
Show

Modern generative models hold great promise for accelerating diverse tasks involving the simulation of physical systems, but they must be adapted to the specific constraints of each domain. Significant progress has been made for biomolecules and crystalline materials. Here, we address amorphous materials (glasses), which are disordered particle systems lacking atomic periodicity. Sampling equilibrium configurations of glass-forming materials is a notoriously slow and difficult task. This obstacle could be overcome by developing a generative framework capable of producing equilibrium configurations with well-defined likelihoods. In this work, we address this challenge by leveraging an equivariant Riemannian stochastic interpolation framework which combines Riemannian stochastic interpolant and equivariant flow matching. Our method rigorously incorporates periodic boundary conditions and the symmetries of multi-component particle systems, adapting an equivariant graph neural network to operate directly on the torus. Our numerical experiments on model amorphous systems demonstrate that enforcing geometric and symmetry constraints significantly improves generative performance.

GFLAN: Generative Functional Layouts 2025-12-18
Show

Automated floor plan generation lies at the intersection of combinatorial search, geometric constraint satisfaction, and functional design requirements -- a confluence that has historically resisted a unified computational treatment. While recent deep learning approaches have improved the state of the art, they often struggle to capture architectural reasoning: the precedence of topological relationships over geometric instantiation, the propagation of functional constraints through adjacency networks, and the emergence of circulation patterns from local connectivity decisions. To address these fundamental challenges, this paper introduces GFLAN, a generative framework that restructures floor plan synthesis through explicit factorization into topological planning and geometric realization. Given a single exterior boundary and a front-door location, our approach departs from direct pixel-to-pixel or wall-tracing generation in favor of a principled two-stage decomposition. Stage A employs a specialized convolutional architecture with dual encoders -- separating invariant spatial context from evolving layout state -- to sequentially allocate room centroids within the building envelope via discrete probability maps over feasible placements. Stage B constructs a heterogeneous graph linking room nodes to boundary vertices, then applies a Transformer-augmented graph neural network (GNN) that jointly regresses room boundaries.

21 pages, 15 figures
Sharpness-aware Federated Graph Learning 2025-12-18
Show

One of many impediments to applying graph neural networks (GNNs) to large-scale real-world graph data is the challenge of centralized training, which requires aggregating data from different organizations, raising privacy concerns. Federated graph learning (FGL) addresses this by enabling collaborative GNN model training without sharing private data. However, a core challenge in FGL systems is the variation in local training data distributions among clients, known as the data heterogeneity problem. Most existing solutions suffer from two problems: (1) The typical optimizer based on empirical risk minimization tends to cause local models to fall into sharp valleys and weakens their generalization to out-of-distribution graph data. (2) The prevalent dimensional collapse in the learned representations of local graph data has an adverse impact on the classification capacity of the GNN model. To this end, we formulate a novel optimization objective that is aware of the sharpness (i.e., the curvature of the loss surface) of local GNN models. By minimizing the loss function and its sharpness simultaneously, we seek out model parameters in a flat region with uniformly low loss values, thus improving the generalization over heterogeneous data. By introducing a regularizer based on the correlation matrix of local representations, we relax the correlations of representations generated by individual local graph samples, so as to alleviate the dimensional collapse of the learned model. The proposed \textbf{S}harpness-aware f\textbf{E}derated gr\textbf{A}ph \textbf{L}earning (SEAL) algorithm can enhance the classification accuracy and generalization ability of local GNN models in federated graph learning. Experimental studies on several graph classification benchmarks show that SEAL consistently outperforms SOTA FGL baselines and provides gains for more participants.

Accepted by WSDM'26
Coarse-to-Fine Open-Set Graph Node Classification with Large Language Models 2025-12-18
Show

Developing open-set classification methods capable of classifying in-distribution (ID) data while detecting out-of-distribution (OOD) samples is essential for deploying graph neural networks (GNNs) in open-world scenarios. Existing methods typically treat all OOD samples as a single class, despite real-world applications, especially high-stake settings such as fraud detection and medical diagnosis, demanding deeper insights into OOD samples, including their probable labels. This raises a critical question: can OOD detection be extended to OOD classification without true label information? To address this question, we propose a Coarse-to-Fine open-set Classification (CFC) framework that leverages large language models (LLMs) for graph datasets. CFC consists of three key components: a coarse classifier that uses LLM prompts for OOD detection and outlier label generation, a GNN-based fine classifier trained with OOD samples identified by the coarse classifier for enhanced OOD detection and ID classification, and refined OOD classification achieved through LLM prompts and post-processed OOD labels. Unlike methods that rely on synthetic or auxiliary OOD samples, CFC employs semantic OOD instances that are genuinely out-of-distribution based on their inherent meaning, improving interpretability and practical utility. Experimental results show that CFC improves OOD detection by ten percent over state-of-the-art methods on graph and text domains and achieves up to seventy percent accuracy in OOD classification on graph datasets.

Accep...

Accepted to AAAI 2026

The Evolution of Reranking Models in Information Retrieval: From Heuristic Methods to Large Language Models 2025-12-18
Show

Reranking is a critical stage in contemporary information retrieval (IR) systems, improving the relevance of the user-presented final results by honing initial candidate sets. This paper is a thorough guide to examine the changing reranker landscape and offer a clear view of the advancements made in reranking methods. We present a comprehensive survey of reranking models employed in IR, particularly within modern Retrieval Augmented Generation (RAG) pipelines, where retrieved documents notably influence output quality. We embark on a chronological journey through the historical trajectory of reranking techniques, starting with foundational approaches, before exploring the wide range of sophisticated neural network architectures such as cross-encoders, sequence-generation models like T5, and Graph Neural Networks (GNNs) utilized for structural information. Recognizing the computational cost of advancing neural rerankers, we analyze techniques for enhancing efficiency, notably knowledge distillation for creating competitive, lighter alternatives. Furthermore, we map the emerging territory of integrating Large Language Models (LLMs) in reranking, examining novel prompting strategies and fine-tuning tactics. This survey seeks to elucidate the fundamental ideas, relative effectiveness, computational features, and real-world trade-offs of various reranking strategies. The survey provides a structured synthesis of the diverse reranking paradigms, highlighting their underlying principles and comparative strengths and weaknesses.

15 pa...

15 pages, 1 figure, Accepted in CLNLP'25

A Multi-scale Fused Graph Neural Network with Inter-view Contrastive Learning for Spatial Transcriptomics Data Clustering 2025-12-18
Show

Spatial transcriptomics enables genome-wide expression analysis within native tissue context, yet identifying spatial domains remains challenging due to complex gene-spatial interactions. Existing methods typically process spatial and feature views separately, fusing only at output level - an "encode-separately, fuse-late" paradigm that limits multi-scale semantic capture and cross-view interaction. Accordingly, stMFG is proposed, a multi-scale interactive fusion graph network that introduces layer-wise cross-view attention to dynamically integrate spatial and gene features after each convolution. The model combines cross-view contrastive learning with spatial constraints to enhance discriminability while maintaining spatial continuity. On DLPFC and breast cancer datasets, stMFG outperforms state-of-the-art methods, achieving up to 14% ARI improvement on certain slices.

15 pages, 3 figures
A Multimodal Approach to Alzheimer's Diagnosis: Geometric Insights from Cube Copying and Cognitive Assessments 2025-12-18
Show

Early and accessible detection of Alzheimer's disease (AD) remains a critical clinical challenge, and cube-copying tasks offer a simple yet informative assessment of visuospatial function. This work proposes a multimodal framework that converts hand-drawn cube sketches into graph-structured representations capturing geometric and topological properties, and integrates these features with demographic information and neuropsychological test (NPT) scores for AD classification. Cube drawings are modeled as graphs with node features encoding spatial coordinates, local graphlet-based topology, and angular geometry, which are processed using graph neural networks and fused with age, education, and NPT features in a late-fusion model. Experimental results show that graph-based representations provide a strong unimodal baseline and substantially outperform pixel-based convolutional models, while multimodal integration further improves performance and robustness to class imbalance. SHAP-based interpretability analysis identifies specific graphlet motifs and geometric distortions as key predictors, closely aligning with clinical observations of disorganized cube drawings in AD. Together, these results establish graph-based analysis of cube copying as an interpretable, non-invasive, and scalable approach for Alzheimer's disease screening.

SegGraph: Leveraging Graphs of SAM Segments for Few-Shot 3D Part Segmentation 2025-12-18
Show

This work presents a novel framework for few-shot 3D part segmentation. Recent advances have demonstrated the significant potential of 2D foundation models for low-shot 3D part segmentation. However, it is still an open problem that how to effectively aggregate 2D knowledge from foundation models to 3D. Existing methods either ignore geometric structures for 3D feature learning or neglects the high-quality grouping clues from SAM, leading to under-segmentation and inconsistent part labels. We devise a novel SAM segment graph-based propagation method, named SegGraph, to explicitly learn geometric features encoded within SAM's segmentation masks. Our method encodes geometric features by modeling mutual overlap and adjacency between segments while preserving intra-segment semantic consistency. We construct a segment graph, conceptually similar to an atlas, where nodes represent segments and edges capture their spatial relationships (overlap/adjacency). Each node adaptively modulates 2D foundation model features, which are then propagated via a graph neural network to learn global geometric structures. To enforce intra-segment semantic consistency, we map segment features to 3D points with a novel view-direction-weighted fusion attenuating contributions from low-quality segments. Extensive experiments on PartNet-E demonstrate that our method outperforms all competing baselines by at least 6.9 percent mIoU. Further analysis reveals that SegGraph achieves particularly strong performance on small components and part boundaries, demonstrating its superior geometric understanding. The code is available at: https://github.com/YueyangHu2000/SegGraph.

Graph Neural Networks for Interferometer Simulations 2025-12-18
Show

In recent years, graph neural networks (GNNs) have shown tremendous promise in solving problems in high energy physics, materials science, and fluid dynamics. In this work, we introduce a new application for GNNs in the physical sciences: instrumentation design. As a case study, we apply GNNs to simulate models of the Laser Interferometer Gravitational-Wave Observatory (LIGO) and show that they are capable of accurately capturing the complex optical physics at play, while achieving runtimes 815 times faster than state of the art simulation packages. We discuss the unique challenges this problem provides for machine learning models. In addition, we provide a dataset of high-fidelity optical physics simulations for three interferometer topologies, which can be used as a benchmarking suite for future work in this direction.

Learning Multimodal Embeddings for Traffic Accident Prediction and Causal Estimation 2025-12-17
Show

We consider analyzing traffic accident patterns using both road network data and satellite images aligned to road graph nodes. Previous work for predicting accident occurrences relies primarily on road network structural features while overlooking physical and environmental information from the road surface and its surroundings. In this work, we construct a large multimodal dataset across six U.S. states, containing nine million traffic accident records from official sources, and one million high-resolution satellite images for each node of the road network. Additionally, every node is annotated with features such as the region's weather statistics and road type (e.g., residential vs. motorway), and each edge is annotated with traffic volume information (i.e., Average Annual Daily Traffic). Utilizing this dataset, we conduct a comprehensive evaluation of multimodal learning methods that integrate both visual and network embeddings. Our findings show that integrating both data modalities improves prediction accuracy, achieving an average AUROC of $90.1%$, which is a $3.7%$ gain over graph neural network models that only utilize graph structures. With the improved embeddings, we conduct a causal analysis based on a matching estimator to estimate the key contributing factors influencing traffic accidents. We find that accident rates rise by $24%$ under higher precipitation, by $22%$ on higher-speed roads such as motorways, and by $29%$ due to seasonal patterns, after adjusting for other confounding factors. Ablation studies confirm that satellite imagery features are essential for achieving accurate prediction.

17 pa...

17 pages. To appear in KDD'26 Datasets

Uncovering Alzheimer's Disease Progression via SDE-based Spatio-Temporal Graph Deep Learning on Longitudinal Brain Networks 2025-12-17
Show

Identifying objective neuroimaging biomarkers to forecast Alzheimer's disease (AD) progression is crucial for timely intervention. However, this task remains challenging due to the complex dysfunctions in the spatio-temporal characteristics of underlying brain networks, which are often overlooked by existing methods. To address these limitations, we develop an interpretable spatio-temporal graph neural network framework to predict future AD progression, leveraging dual Stochastic Differential Equations (SDEs) to model the irregularly-sampled longitudinal functional magnetic resonance imaging (fMRI) data. We validate our approach on two independent cohorts, including the Open Access Series of Imaging Studies (OASIS-3) and the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our framework effectively learns sparse regional and connective importance probabilities, enabling the identification of key brain circuit abnormalities associated with disease progression. Notably, we detect the parahippocampal cortex, prefrontal cortex, and parietal lobule as salient regions, with significant disruptions in the ventral attention, dorsal attention, and default mode networks. These abnormalities correlate strongly with longitudinal AD-related clinical symptoms. Moreover, our interpretability strategy reveals both established and novel neural systems-level and sex-specific biomarkers, offering new insights into the neurobiological mechanisms underlying AD progression. Our findings highlight the potential of spatio-temporal graph-based learning for early, individualized prediction of AD progression, even in the context of irregularly-sampled longitudinal imaging data.

Structure-Aligned Protein Language Model 2025-12-17
Show

Protein language models (pLMs) pre-trained on vast protein sequence databases excel at various downstream tasks but often lack the structural knowledge essential for some biological applications. To address this, we introduce a method to enrich pLMs with structural knowledge by leveraging pre-trained protein graph neural networks (pGNNs). First, a latent-level contrastive learning task aligns residue representations from pLMs with those from pGNNs across multiple proteins, injecting inter-protein structural information. Additionally, a physical-level task integrates intra-protein information by training pLMs to predict structure tokens. Together, the proposed dual-task framework effectively incorporates both inter- and intra-protein structural knowledge into pLMs. Given the variability in the quality of protein structures in PDB, we further introduce a residue loss selection module that uses a small model trained on high-quality structures to select reliable yet challenging residue losses for the pLM to learn. Applying our structure alignment method as a simple, lightweight post-training step to the state-of-the-art ESM2 and AMPLIFY yields notable performance gains. These improvements are consistent across a wide range of tasks, including substantial gains in deep mutational scanning (DMS) fitness prediction and a 59% increase in P@L for ESM2 650M contact prediction on CASP16. Furthermore, we demonstrate that these performance gains are robust, scaling with model sizes from 8M to 650M and extending to different downstream tasks.

28 pa...

28 pages, 16 figures, 9 tables

Exact Verification of Graph Neural Networks with Incremental Constraint Solving 2025-12-17
Show

Graph neural networks (GNNs) are increasingly employed in high-stakes applications, such as fraud detection or healthcare, but are susceptible to adversarial attacks. A number of techniques have been proposed to provide adversarial robustness guarantees, but support for commonly used aggregation functions in message-passing GNNs is lacking. In this paper, we develop an exact (sound and complete) verification method for GNNs to compute guarantees against attribute and structural perturbations that involve edge addition or deletion, subject to budget constraints. Our method employs constraint solving with bound tightening, and iteratively solves a sequence of relaxed constraint satisfaction problems while relying on incremental solving capabilities of solvers to improve efficiency. We implement GNNev, a versatile exact verifier for message-passing neural networks, which supports three aggregation functions, sum, max and mean, with the latter two considered here for the first time. Extensive experimental evaluation of GNNev on real-world fraud datasets (Amazon and Yelp) and biochemical datasets (MUTAG and ENZYMES) demonstrates its usability and effectiveness, as well as superior performance for node classification and competitiveness on graph classification compared to existing exact verification tools on sum-aggregated GNNs.

Graph Contextual Reinforcement Learning for Efficient Directed Controller Synthesis 2025-12-17
Show

Controller synthesis is a formal method approach for automatically generating Labeled Transition System (LTS) controllers that satisfy specified properties. The efficiency of the synthesis process, however, is critically dependent on exploration policies. These policies often rely on fixed rules or strategies learned through reinforcement learning (RL) that consider only a limited set of current features. To address this limitation, this paper introduces GCRL, an approach that enhances RL-based methods by integrating Graph Neural Networks (GNNs). GCRL encodes the history of LTS exploration into a graph structure, allowing it to capture a broader, non-current-based context. In a comparative experiment against state-of-the-art methods, GCRL exhibited superior learning efficiency and generalization across four out of five benchmark domains, except one particular domain characterized by high symmetry and strictly local interactions.

Improving Subgraph Matching by Combining Algorithms and Graph Neural Networks 2025-12-17
Show

Homomorphism is a key mapping technique between graphs that preserves their structure. Given a graph and a pattern, the subgraph homomorphism problem involves finding a mapping from the pattern to the graph, ensuring that adjacent vertices in the pattern are mapped to adjacent vertices in the graph. Unlike subgraph isomorphism, which requires a one-to-one mapping, homomorphism allows multiple vertices in the pattern to map to the same vertex in the graph, making it more complex. We propose HFrame, the first graph neural network-based framework for subgraph homomorphism, which integrates traditional algorithms with machine learning techniques. We demonstrate that HFrame outperforms standard graph neural networks by being able to distinguish more graph pairs where the pattern is not homomorphic to the graph. Additionally, we provide a generalization error bound for HFrame. Through experiments on both real-world and synthetic graphs, we show that HFrame is up to 101.91 times faster than exact matching algorithms and achieves an average accuracy of 0.962.

KDD 2025
Accelerating High-Throughput Catalyst Screening by Direct Generation of Equilibrium Adsorption Structures 2025-12-17
Show

The adsorption energy serves as a crucial descriptor for the large-scale screening of catalysts. Nevertheless, the limited distribution of training data for the extensively utilised machine learning interatomic potential (MLIP), predominantly sourced from near-equilibrium structures, results in unreliable adsorption structures and consequent adsorption energy predictions. In this context, we present DBCata, a deep generative model that integrates a periodic Brownian-bridge framework with an equivariant graph neural network to establish a low-dimensional transition manifold between unrelaxed and DFT-relaxed structures, without requiring explicit energy or force information. Upon training, DBCata effectively generates high-fidelity adsorption geometries, achieving an interatomic distance mean absolute error (DMAE) of 0.035 \textÅ on the Catalysis-Hub dataset, which is nearly three times superior to that of the current state-of-the-art machine learning potential models. Moreover, the corresponding DFT accuracy can be improved within 0.1 eV in 94% of instances by identifying and refining anomalous predictions through a hybrid chemical-heuristic and self-supervised outlier detection approach. We demonstrate that the remarkable performance of DBCata facilitates accelerated high-throughput computational screening for efficient alloy catalysts in the oxygen reduction reaction, highlighting the potential of DBCata as a powerful tool for catalyst design and optimisation.

Dual-Axis RCCL: Representation-Complete Convergent Learning for Organic Chemical Space 2025-12-17
Show

Machine learning is profoundly reshaping molecular and materials modeling; however, given the vast scale of chemical space (10^30-10^60), it remains an open scientific question whether models can achieve convergent learning across this space. We introduce a Dual-Axis Representation-Complete Convergent Learning (RCCL) strategy, enabled by a molecular representation that integrates graph convolutional network (GCN) encoding of local valence environments, grounded in modern valence bond theory, together with no-bridge graph (NBG) encoding of ring/cage topologies, providing a quantitative measure of chemical-space coverage. This framework formalizes representation completeness, establishing a principled basis for constructing datasets that support convergent learning for large models. Guided by this RCCL framework, we develop the FD25 dataset, systematically covering 13,302 local valence units and 165,726 ring/cage topologies, achieving near-complete combinatorial coverage of organic molecules with H/C/N/O/F elements. Graph neural networks trained on FD25 exhibit representation-complete convergent learning and strong out-of-distribution generalization, with an overall prediction error of approximately 1.0 kcal/mol MAE across external benchmarks. Our results establish a quantitative link between molecular representation, structural completeness, and model generalization, providing a foundation for interpretable, transferable, and data-efficient molecular intelligence.

33 pages, 10 figures
Convergent Privacy Framework for Multi-layer GNNs through Contractive Message Passing 2025-12-17
Show

Differential privacy (DP) has been integrated into graph neural networks (GNNs) to protect sensitive structural information, e.g., edges, nodes, and associated features across various applications. A prominent approach is to perturb the message-passing process, which forms the core of most GNN architectures. However, existing methods typically incur a privacy cost that grows linearly with the number of layers (e.g., GAP published in Usenix Security'23), ultimately requiring excessive noise to maintain a reasonable privacy level. This limitation becomes particularly problematic when multi-layer GNNs, which have shown better performance than one-layer GNN, are used to process graph data with sensitive information. In this paper, we theoretically establish that the privacy budget converges with respect to the number of layers by applying privacy amplification techniques to the message-passing process, exploiting the contractive properties inherent to standard GNN operations. Motivated by this analysis, we propose a simple yet effective Contractive Graph Layer (CGL) that ensures the contractiveness required for theoretical guarantees while preserving model utility. Our framework, CARIBOU, supports both training and inference, equipped with a contractive aggregation module, a privacy allocation module, and a privacy auditing module. Experimental evaluations demonstrate that CARIBOU significantly improves the privacy-utility trade-off and achieves superior performance in privacy auditing tasks.

31 pages
RELIC-GNN: Efficient State Registers Identification with Graph Neural Network for Reverse Engineering 2025-12-17
Show

Reverse engineering of gate-level netlist is critical for Hardware Trojans detection and Design Piracy counteracting. The primary task of gate-level reverse engineering is to separate the control and data signals from the netlist, which is mainly realized by identifying state registers with topological comparison.However, these methods become inefficient for large scale netlist. In this work, we propose RELIC-GNN, a graph neural network based state registers identification method, to address these issues. RELIC-GNN models the path structure of register as a graph and generates corresponding representation by considering node attributes and graph structure during training. The trained GNN model could be adopted to find the registers type very efficiently. Experimental results show that RELIC-GNN could achieve 100% in recall, 30.49% in precision and 88.37% in accuracy on average across different designs, which obtains significant improvements than previous approaches.

WaveGNN: Integrating Graph Neural Networks and Transformers for Decay-Aware Classification of Irregular Clinical Time-Series 2025-12-17
Show

Clinical time series are often irregularly sampled, with varying sensor frequencies, missing observations, and misaligned timestamps. Prior approaches typically address these irregularities by interpolating data into regular sequences, thereby introducing bias, or by generating inconsistent and uninterpretable relationships across sensor measurements, complicating the accurate learning of both intra-series and inter-series dependencies. We introduce WaveGNN, a model that operates directly on irregular multivariate time series without interpolation or conversion to a regular representation. WaveGNN combines a decay-aware Transformer to capture intra-series dynamics with a sample-specific graph neural network that models both short-term and long-term inter-sensor relationships. Therefore, it generates a single, sparse, and interpretable graph per sample. Across multiple benchmark datasets (P12, P19, MIMIC-III, and PAM), WaveGNN delivers consistently strong performance, whereas other state-of-the-art baselines tend to perform well on some datasets or tasks but poorly on others. While WaveGNN does not necessarily surpass every method in every case, its consistency and robustness across diverse settings set it apart. Moreover, the learned graphs align well with known physiological structures, enhancing interpretability and supporting clinical decision-making.

Accep...

Accepted to IEEE BigData 2025. Author-accepted manuscript

Improving Graph Neural Network Training, Defense and Hypergraph Partitioning via Adversarial Robustness Evaluation 2025-12-17
Show

Graph Neural Networks (GNNs) are a highly effective neural network architecture for processing graph-structured data. Unlike traditional neural networks that rely solely on the features of the data as input, GNNs leverage both the graph structure, which represents the relationships between data points, and the feature matrix of the data to optimize their feature representation. This unique capability enables GNNs to achieve superior performance across various tasks. However, it also makes GNNs more susceptible to noise and adversarial attacks from both the graph structure and data features, which can significantly increase the training difficulty and degrade their performance. Similarly, a hypergraph is a highly complex structure, and partitioning a hypergraph is a challenging task. This paper leverages spectral adversarial robustness evaluation to effectively address key challenges in complex-graph algorithms. By using spectral adversarial robustness evaluation to distinguish robust nodes from non-robust ones and treating them differently, we propose a training-set construction strategy that improves the training quality of GNNs. In addition, we develop algorithms to enhance both the adversarial robustness of GNNs and the performance of hypergraph partitioning. Experimental results show that this series of methods is highly effective.

Spatio-Temporal Graph Neural Network for Urban Spaces: Interpolating Citywide Traffic Volume 2025-12-16
Show

Graph Neural Networks have shown strong performance in traffic volume forecasting, particularly on highways and major arterial networks. Applying them to urban settings, however, presents unique challenges: urban networks exhibit greater structural diversity, traffic volumes are highly overdispersed with many zeros, the best way to account for spatial dependencies remains unclear, and sensor coverage is often very sparse. We introduce the Graph Neural Network for Urban Interpolation (GNNUI), a novel urban traffic volume estimation approach. GNNUI employs a masking algorithm to learn interpolation, integrates node features to capture functional roles, and uses a loss function tailored to zero-inflated traffic distributions. In addition to the model, we introduce two new open, large-scale urban traffic volume benchmarks, covering different transportation modes: Strava cycling data from Berlin and New York City taxi data. GNNUI outperforms recent, some graph-based, interpolation methods across metrics (MAE, RMSE, true-zero rate, Kullback-Leibler divergence) and remains robust from 90% to 1% sensor coverage. For example, on the Strava dataset, the MAE increases only from 7.1 to 10.5, and on the Taxi dataset, from 23.0 to 40.4. These results demonstrate that GNNUI maintains strong performance despite extreme data scarcity, a common condition in real-world urban settings. We also examine how graph connectivity choices influence model accuracy.

ATLAS: Adaptive Topology-based Learning at Scale for Homophilic and Heterophilic Graphs 2025-12-16
Show

We present ATLAS (Adaptive Topology-based Learning at Scale for Homophilic and Heterophilic Graphs), a novel graph learning algorithm that addresses two important challenges in graph neural networks (GNNs). First, the accuracy of GNNs degrades when the graph is heterophilic. Second, iterative feature aggregation limits the scalability of GNNs to large graphs. We address these challenges by extracting topological information about graph communities at multiple levels of refinement, concatenating community assignments to the feature vector, and applying multilayer perceptrons (MLPs) to the resulting representation. This provides topological context about nodes and their neighborhoods without invoking aggregation. Because MLPs are typically more scalable than GNNs, our approach applies to large graphs without the need for sampling. Across a wide set of graphs, ATLAS achieves comparable accuracy to baseline methods, with gains as high as 20 percentage points over GCN for heterophilic graphs with negative structural bias and 11 percentage points over MLP for homophilic graphs. Furthermore, we show how multi-resolution community features systematically modulate performance in both homophilic and heterophilic settings, opening a principled path toward explainable graph learning.

Preprint
GNN101: Visual Learning of Graph Neural Networks in Your Web Browser 2025-12-16
Show

Graph Neural Networks (GNNs) have achieved significant success across various applications. However, their complex structures and inner workings can be challenging for non-AI experts to understand. To address this issue, this study presents \name{}, an educational visualization tool for interactive learning of GNNs. GNN 101 introduces a set of animated visualizations that seamlessly integrate mathematical formulas with visualizations via multiple levels of abstraction, including a model overview, layer operations, and detailed calculations. Users can easily switch between two complementary views: a node-link view that offers an intuitive understanding of the graph data, and a matrix view that provides a space-efficient and comprehensive overview of all features and their transformations across layers. GNN 101 was designed and developed based on close collaboration with four GNN experts and deployment in three GNN-related courses. We demonstrated the usability and effectiveness of GNN 101 via use cases and user studies with both GNN teaching assistants and students. To ensure broad educational access, GNN 101 is open-source and available directly in web browsers without requiring any installations.

A Roadmap for Applying Graph Neural Networks to Numerical Data: Insights from Cementitious Materials 2025-12-16
Show

Machine learning (ML) has been increasingly applied in concrete research to optimize performance and mixture design. However, one major challenge in applying ML to cementitious materials is the limited size and diversity of available databases. A promising solution is the development of multi-modal databases that integrate both numerical and graphical data. Conventional ML frameworks in cement research are typically restricted to a single data modality. Graph neural network (GNN) represents a new generation of neural architectures capable of learning from data structured as graphs, capturing relationships through irregular or topology-dependent connections rather than fixed spatial coordinates. While GNN is inherently designed for graphical data, they can be adapted to extract correlations from numerical datasets and potentially embed physical laws directly into their architecture, enabling explainable and physics-informed predictions. This work is among the first few studies to implement GNNs to design concrete, with a particular emphasis on establishing a clear and reproducible pathway for converting tabular data into graph representations using the k-nearest neighbor (K-NN) approach. Model hyperparameters and feature selection are systematically optimized to enhance prediction performance. The GNN shows performance comparable to the benchmark random forest, which has been demonstrated by many studies to yield reliable predictions for cementitious materials. Overall, this study provides a foundational roadmap for transitioning from traditional ML to advanced AI architectures. The proposed framework establishes a strong foundation for future multi-modal and physics-informed GNN models capable of capturing complex material behaviors and accelerating the design and optimization of cementitious materials.

TomoGraphView: 3D Medical Image Classification with Omnidirectional Slice Representations and Graph Neural Networks 2025-12-16
Show

The sharp rise in medical tomography examinations has created a demand for automated systems that can reliably extract informative features for downstream tasks such as tumor characterization. Although 3D volumes contain richer information than individual slices, effective 3D classification remains difficult: volumetric data encode complex spatial dependencies, and the scarcity of large-scale 3D datasets has constrained progress toward 3D foundation models. As a result, many recent approaches rely on 2D vision foundation models trained on natural images, repurposing them as feature extractors for medical scans with surprisingly strong performance. Despite their practical success, current methods that apply 2D foundation models to 3D scans via slice-based decomposition remain fundamentally limited. Standard slicing along axial, sagittal, and coronal planes often fails to capture the true spatial extent of a structure when its orientation does not align with these canonical views. More critically, most approaches aggregate slice features independently, ignoring the underlying 3D geometry and losing spatial coherence across slices. To overcome these limitations, we propose TomoGraphView, a novel framework that integrates omnidirectional volume slicing with spherical graph-based feature aggregation. Instead of restricting the model to axial, sagittal, or coronal planes, our method samples both canonical and non-canonical cross-sections generated from uniformly distributed points on a sphere enclosing the volume. We publicly share our accessible code base at http://github.com/compai-lab/2025-MedIA-kiechle and provide a user-friendly library for omnidirectional volume slicing at https://pypi.org/project/OmniSlicer.

Prepr...

Preprint submitted to Medical Image Analysis (MedIA)

Property-Isometric Variational Autoencoders for Sequence Modeling and Design 2025-12-16
Show

Biological sequence design (DNA, RNA, or peptides) with desired functional properties has applications in discovering novel nanomaterials, biosensors, antimicrobial drugs, and beyond. One common challenge is the ability to optimize complex high-dimensional properties such as target emission spectra of DNA-mediated fluorescent nanoparticles, photo and chemical stability, and antimicrobial activity of peptides across target microbes. Existing models rely on simple binary labels (e.g., binding/non-binding) rather than high-dimensional complex properties. To address this gap, we propose a geometry-preserving variational autoencoder framework, called PrIVAE, which learns latent sequence embeddings that respect the geometry of their property space. Specifically, we model the property space as a high-dimensional manifold that can be locally approximated by a nearest neighbor graph, given an appropriately defined distance measure. We employ the property graph to guide the sequence latent representations using (1) graph neural network encoder layers and (2) an isometric regularizer. PrIVAE learns a property-organized latent space that enables rational design of new sequences with desired properties by employing the trained decoder. We evaluate the utility of our framework for two generative tasks: (1) design of DNA sequences that template fluorescent metal nanoclusters and (2) design of antimicrobial peptides. The trained models retain high reconstruction accuracy while organizing the latent space according to properties. Beyond in silico experiments, we also employ sampled sequences for wet lab design of DNA nanoclusters, resulting in up to 16.1-fold enrichment of rare-property nanoclusters compared to their abundance in training data, demonstrating the practical utility of our framework.

23 pa...

23 pages, 6 figures, preprint

A Graph-Based Forensic Framework for Inferring Hardware Noise of Cloud Quantum Backend 2025-12-16
Show

Cloud quantum platforms give users access to many backends with different qubit technologies, coupling layouts, and noise levels. The execution of a circuit, however, depends on internal allocation and routing policies that are not observable to the user. A provider may redirect jobs to more error-prone regions to conserve resources, balance load or for other opaque reasons, causing degradation in fidelity while still presenting stale or averaged calibration data. This lack of transparency creates a security gap: users cannot verify whether their circuits were executed on the hardware for which they were charged. Forensic methods that infer backend behavior from user-visible artifacts are therefore becoming essential. In this work, we introduce a Graph Neural Network (GNN)-based forensic framework that predicts per-qubit and per-qubit link error rates of an unseen backend using only topology information and aggregated features extracted from transpiled circuits. We construct a dataset from several IBM 27-qubit devices, merge static calibration features with dynamic transpilation features and train separate GNN regressors for one- and two-qubit errors. At inference time, the model operates without access to calibration data from the target backend and reconstructs a complete error map from the features available to the user. Our results on the target backend show accurate recovery of backend error rate, with an average mismatch of approximately 22% for single-qubit errors and 18% for qubit-link errors. The model also exhibits strong ranking agreement, with the ordering induced by predicted error values closely matching that of the actual calibration errors, as reflected by high Spearman correlation. The framework consistently identifies weak links and high-noise qubits and remains robust under realistic temporal noise drift.

11 pa...

11 pages, 5 figures, conference

Online Multi-modal Root Cause Identification in Microservice Systems 2025-12-16
Show

Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems. Traditional data-driven RCA methods are typically limited to offline applications due to high computational demands, and existing online RCA methods handle only single-modal data, overlooking complex interactions in multi-modal systems. In this paper, we introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization. OCEAN employs a dilated convolutional neural network to capture long-term temporal dependencies and graph neural networks to learn causal relationships among system entities and key performance indicators. We further design a multi-factor attention mechanism to analyze and reassess the relationships among different metrics and log indicators/attributes for enhanced online causal graph learning. Additionally, a contrastive mutual information maximization-based graph fusion module is developed to effectively model the relationships across various modalities. Extensive experiments on three real-world datasets demonstrate the effectiveness and efficiency of our proposed method.

Accep...

Accepted by BigData 2025

HeSRN: Representation Learning On Heterogeneous Graphs via Slot-Aware Retentive Network 2025-12-16
Show

Graph Transformers have recently achieved remarkable progress in graph representation learning by capturing long-range dependencies through self-attention. However, their quadratic computational complexity and inability to effectively model heterogeneous semantics severely limit their scalability and generalization on real-world heterogeneous graphs. To address these issues, we propose HeSRN, a novel Heterogeneous Slot-aware Retentive Network for efficient and expressive heterogeneous graph representation learning. HeSRN introduces a slot-aware structure encoder that explicitly disentangles node-type semantics by projecting heterogeneous features into independent slots and aligning their distributions through slot normalization and retention-based fusion, effectively mitigating the semantic entanglement caused by forced feature-space unification in previous Transformer-based models. Furthermore, we replace the self-attention mechanism with a retention-based encoder, which models structural and contextual dependencies in linear time complexity while maintaining strong expressive power. A heterogeneous retentive encoder is further employed to jointly capture both local structural signals and global heterogeneous semantics through multi-scale retention layers. Extensive experiments on four real-world heterogeneous graph datasets demonstrate that HeSRN consistently outperforms state-of-the-art heterogeneous graph neural networks and Graph Transformer baselines on node classification tasks, achieving superior accuracy with significantly lower computational complexity.

GALAX: Graph-Augmented Language Model for Explainable Reinforcement-Guided Subgraph Reasoning in Precision Medicine 2025-12-16
Show

In precision medicine, quantitative multi-omic features, topological context, and textual biological knowledge play vital roles in identifying disease-critical signaling pathways and targets. Existing pipelines capture only part of these-numerical omics ignore topological context, text-centric LLMs lack quantitative grounded reasoning, and graph-only models underuse node semantics and the generalization of LLMs-limiting mechanistic interpretability. Although Process Reward Models (PRMs) aim to guide reasoning in LLMs, they remain limited by unreliable intermediate evaluation, and vulnerability to reward hacking with computational cost. These gaps motivate integrating quantitative multi-omic signals, topological structure with node annotations, and literature-scale text via LLMs, using subgraph reasoning as the principle bridge linking numeric evidence, topological knowledge and language context. Therefore, we propose GALAX (Graph Augmented LAnguage model with eXplainability), an innovative framework that integrates pretrained Graph Neural Networks (GNNs) into Large Language Models (LLMs) via reinforcement learning guided by a Graph Process Reward Model (GPRM), which generates disease-relevant subgraphs in a step-wise manner initiated by an LLM and iteratively evaluated by a pretrained GNN and schema-based rule check, enabling process-level supervision without explicit labels. As an application, we also introduced Target-QA, a benchmark combining CRISPR-identified targets, multi-omic profiles, and biomedical graph knowledge across diverse cancer cell lines, which enables GNN pretraining for supervising step-wise graph construction and supports long-context reasoning over text-numeric graphs (TNGs), providing a scalable and biologically grounded framework for explainable, reinforcement-guided subgraph reasoning toward reliable and interpretable target discovery in precision medicine.

ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes 2025-12-16
Show

Purpose: Detailed surgical recognition is critical for advancing AI-assisted surgery, yet progress is hampered by high annotation costs, data scarcity, and a lack of interpretable models. While scene graphs offer a structured abstraction of surgical events, their full potential remains untapped. In this work, we introduce ProtoFlow, a novel framework that learns dynamic scene graph prototypes to model complex surgical workflows in an interpretable and robust manner. Methods: ProtoFlow leverages a graph neural network (GNN) encoder-decoder architecture that combines self-supervised pretraining for rich representation learning with a prototype-based fine-tuning stage. This process discovers and refines core prototypes that encapsulate recurring, clinically meaningful patterns of surgical interaction, forming an explainable foundation for workflow analysis. Results: We evaluate our approach on the fine-grained CAT-SG dataset. ProtoFlow not only outperforms standard GNN baselines in overall accuracy but also demonstrates exceptional robustness in limited-data, few-shot scenarios, maintaining strong performance when trained on as few as one surgical video. Our qualitative analyses further show that the learned prototypes successfully identify distinct surgical sub-techniques and provide clear, interpretable insights into workflow deviations and rare complications. Conclusion: By uniting robust representation learning with inherent explainability, ProtoFlow represents a significant step toward developing more transparent, reliable, and data-efficient AI systems, accelerating their potential for clinical adoption in surgical training, real-time decision support, and workflow optimization.

Multivariate Time Series Forecasting with Hybrid Euclidean-SPD Manifold Graph Neural Networks 2025-12-16
Show

Multivariate Time Series (MTS) forecasting plays a vital role in various real-world applications, such as traffic management and predictive maintenance. Existing approaches typically model MTS data in either Euclidean or Riemannian space, limiting their ability to capture the diverse geometric structures and complex spatio-temporal dependencies inherent in real-world data. To overcome this limitation, we propose the Hybrid Symmetric Positive-Definite Manifold Graph Neural Network (HSMGNN), a novel graph neural network-based model that captures data geometry within a hybrid Euclidean-Riemannian framework. To the best of our knowledge, this is the first work to leverage hybrid geometric representations for MTS forecasting, enabling expressive and comprehensive modeling of geometric properties. Specifically, we introduce a Submanifold-Cross-Segment (SCS) embedding to project input MTS into both Euclidean and Riemannian spaces, thereby capturing spatio-temporal variations across distinct geometric domains. To alleviate the high computational cost of Riemannian distance, we further design an Adaptive-Distance-Bank (ADB) layer with a trainable memory mechanism. Finally, a Fusion Graph Convolutional Network (FGCN) is devised to integrate features from the dual spaces via a learnable fusion operator for accurate prediction. Experiments on three benchmark datasets demonstrate that HSMGNN achieves up to a 13.8 percent improvement over state-of-the-art baselines in forecasting accuracy.

A Complete Guide to Spherical Equivariant Graph Transformers 2025-12-15
Show

Spherical equivariant graph neural networks (EGNNs) provide a principled framework for learning on three-dimensional molecular and biomolecular systems, where predictions must respect the rotational symmetries inherent in physics. These models extend traditional message-passing GNNs and Transformers by representing node and edge features as spherical tensors that transform under irreducible representations of the rotation group SO(3), ensuring that predictions change in physically meaningful ways under rotations of the input. This guide develops a complete, intuitive foundation for spherical equivariant modeling - from group representations and spherical harmonics, to tensor products, Clebsch-Gordan decomposition, and the construction of SO(3)-equivariant kernels. Building on this foundation, we construct the Tensor Field Network and SE(3)-Transformer architectures and explain how they perform equivariant message-passing and attention on geometric graphs. Through clear mathematical derivations and annotated code excerpts, this guide serves as a self-contained introduction for researchers and learners seeking to understand or implement spherical EGNNs for applications in chemistry, molecular property prediction, protein structure modeling, and generative modeling.

This ...

This paper is a technical version of the article originally published in Alchemy Bio (99 pages, 46 figures)

Topologically-Stabilized Graph Neural Networks: Empirical Robustness Across Domains 2025-12-15
Show

Graph Neural Networks (GNNs) have become the standard for graph representation learning but remain vulnerable to structural perturbations. We propose a novel framework that integrates persistent homology features with stability regularization to enhance robustness. Building on the stability theorems of persistent homology \cite{cohen2007stability}, our method combines GIN architectures with multi-scale topological features extracted from persistence images, enforced by Hiraoka-Kusano-inspired stability constraints. Across six diverse datasets spanning biochemical, social, and collaboration networks , our approach demonstrates exceptional robustness to edge perturbations while maintaining competitive accuracy. Notably, we observe minimal performance degradation (0-4% on most datasets) under perturbation, significantly outperforming baseline stability. Our work provides both a theoretically-grounded and empirically-validated approach to robust graph learning that aligns with recent advances in topological regularization

LightTopoGAT: Enhancing Graph Attention Networks with Topological Features for Efficient Graph Classification 2025-12-15
Show

Graph Neural Networks have demonstrated significant success in graph classification tasks, yet they often require substantial computational resources and struggle to capture global graph properties effectively. We introduce LightTopoGAT, a lightweight graph attention network that enhances node features through topological augmentation by incorporating node degree and local clustering coefficient to improve graph representation learning. The proposed approach maintains parameter efficiency through streamlined attention mechanisms while integrating structural information that is typically overlooked by local message passing schemes. Through comprehensive experiments on three benchmark datasets, MUTAG, ENZYMES, and PROTEINS, we show that LightTopoGAT achieves superior performance compared to established baselines including GCN, GraphSAGE, and standard GAT, with a 6.6 percent improvement in accuracy on MUTAG and a 2.2 percent improvement on PROTEINS. Ablation studies further confirm that these performance gains arise directly from the inclusion of topological features, demonstrating a simple yet effective strategy for enhancing graph neural network performance without increasing architectural complexity.

9 pages
The prediction of the quality of results in Logic Synthesis using Transformer and Graph Neural Networks 2025-12-15
Show

In the logic synthesis stage, structure transformations in the synthesis tool need to be combined into optimization sequences and act on the circuit to meet the specified circuit area and delay. However, logic synthesis optimization sequences are time-consuming to run, and predicting the quality of the results (QoR) against the synthesis optimization sequence for a circuit can help engineers find a better optimization sequence faster. In this work, we propose a deep learning method to predict the QoR of unseen circuit-optimization sequences pairs. Specifically, the structure transformations are translated into vectors by embedding methods and advanced natural language processing (NLP) technology (Transformer) is used to extract the features of the optimization sequences. In addition, to enable the prediction process of the model to be generalized from circuit to circuit, the graph representation of the circuit is represented as an adjacency matrix and a feature matrix. Graph neural networks(GNN) are used to extract the structural features of the circuits. For this problem, the Transformer and three typical GNNs are used. Furthermore, the Transformer and GNNs are adopted as a joint learning policy for the QoR prediction of the unseen circuit-optimization sequences. The methods resulting from the combination of Transformer and GNNs are benchmarked. The experimental results show that the joint learning of Transformer and GraphSage gives the best results. The Mean Absolute Error (MAE) of the predicted result is 0.412.

Meta Pruning via Graph Metanetworks : A Universal Meta Learning Framework for Network Pruning 2025-12-15
Show

We propose an entirely new meta-learning framework for network pruning. It is a general framework that can be theoretically applied to almost all types of networks with all kinds of pruning and has great generality and transferability. Experiments have shown that it can achieve outstanding results on many popular and representative pruning tasks (including both CNNs and Transformers). Unlike all prior works that either rely on fixed, hand-crafted criteria to prune in a coarse manner, or employ learning to prune ways that require special training during each pruning and lack generality. Our framework can learn complex pruning rules automatically via a neural network (metanetwork) and has great generality that can prune without any special training. More specifically, we introduce the newly developed idea of metanetwork from meta-learning into pruning. A metanetwork is a network that takes another network as input and produces a modified network as output. In this paper, we first establish a bijective mapping between neural networks and graphs, and then employ a graph neural network as our metanetwork. We train a metanetwork that learns the pruning strategy automatically and can transform a network that is hard to prune into another network that is much easier to prune. Once the metanetwork is trained, our pruning needs nothing more than a feedforward through the metanetwork and some standard finetuning to prune at state-of-the-art. Our code is available at https://github.com/Yewei-Liu/MetaPruning

Network-Wide Traffic Volume Estimation from Speed Profiles using a Spatio-Temporal Graph Neural Network with Directed Spatial Attention 2025-12-15
Show

Existing traffic volume estimation methods typically address either forecasting traffic on sensor-equipped roads or spatially imputing missing volumes using nearby sensors. While forecasting models generally disregard unmonitored roads by design, spatial imputation methods explicitly address network-wide estimation; yet this approach relies on volume data at inference time, limiting its applicability in sensor-scarce cities. Unlike traffic volume data, probe vehicle speeds and static road attributes are more broadly accessible and support full coverage of road segments in most urban networks. In this work, we present the Hybrid Directed-Attention Spatio-Temporal Graph Neural Network (HDA-STGNN), an inductive deep learning framework designed to tackle the network-wide volume estimation problem. Our approach leverages speed profiles, static road attributes, and road network topology to predict daily traffic volume profiles across all road segments in the network. To evaluate the effectiveness of our approach, we perform extensive ablation studies that demonstrate the model's capacity to capture complex spatio-temporal dependencies and highlight the value of topological information for accurate network-wide traffic volume estimation without relying on volume data at inference time.

OPAL: Operator-Programmed Algorithms for Landscape-Aware Black-Box Optimization 2025-12-14
Show

Black-box optimization often relies on evolutionary and swarm algorithms whose performance is highly problem dependent. We view an optimizer as a short program over a small vocabulary of search operators and learn this operator program separately for each problem instance. We instantiate this idea in Operator-Programmed Algorithms (OPAL), a landscape-aware framework for continuous black-box optimization that uses a small design budget with a standard differential evolution baseline to probe the landscape, builds a $k$-nearest neighbor graph over sampled points, and encodes this trajectory with a graph neural network. A meta-learner then maps the resulting representation to a phase-wise schedule of exploration, restart, and local search operators. On the CEC2017 test suite, a single meta-trained OPAL policy is statistically competitive with state-of-the-art adaptive differential evolution variants and achieves significant improvements over simpler baselines under nonparametric tests. Ablation studies on CEC2017 justify the choices for the design phase, the trajectory graph, and the operator-program representation, while the meta-components add only modest wall-clock overhead. Overall, the results indicate that operator-programmed, landscape-aware per-instance design is a practical way forward beyond ad hoc metaphor-based algorithms in black-box optimization.

Sourc...

Source code, experiment scripts, and results are publicly available at https://github.com/junbolian/OPAL. The real-world application part hasn't been done yet

Modeling Authorial Style in Urdu Novels Using Character Interaction Graphs and Graph Neural Networks 2025-12-14
Show

Authorship analysis has traditionally focused on lexical and stylistic cues within text, while higher-level narrative structure remains underexplored, particularly for low-resource languages such as Urdu. This work proposes a graph-based framework that models Urdu novels as character interaction networks to examine whether authorial style can be inferred from narrative structure alone. Each novel is represented as a graph where nodes correspond to characters and edges denote their co-occurrence within narrative proximity. We systematically compare multiple graph representations, including global structural features, node-level semantic summaries, unsupervised graph embeddings, and supervised graph neural networks. Experiments on a dataset of 52 Urdu novels written by seven authors show that learned graph representations substantially outperform hand-crafted and unsupervised baselines, achieving up to 0.857 accuracy under a strict author-aware evaluation protocol.

6 pages
Shoot from the HIP: Hessian Interatomic Potentials without derivatives 2025-12-14
Show

Fundamental tasks in computational chemistry, from transition state search to vibrational analysis, rely on molecular Hessians, which are the second derivatives of the potential energy. Yet, Hessians are computationally expensive to calculate and scale poorly with system size, with both quantum mechanical methods and neural networks. In this work, we demonstrate that Hessians can be predicted directly from a deep learning model, without relying on automatic differentiation or finite differences. We observe that one can construct SE(3)-equivariant, symmetric Hessians from irreducible representations (irrep) features up to degree $l$=2 computed during message passing in graph neural networks. This makes HIP Hessians one to two orders of magnitude faster, more accurate, more memory efficient, easier to train, and enables more favorable scaling with system size. We validate our predictions across a wide range of downstream tasks, demonstrating consistently superior performance for transition state search, accelerated geometry optimization, zero-point energy corrections, and vibrational analysis benchmarks. We open-source the HIP codebase and model weights to enable further development of the direct prediction of Hessians at https://github.com/BurgerAndreas/hip

https...

https://github.com/BurgerAndreas/hip

Torch Geometric Pool: the Pytorch library for pooling in Graph Neural Networks 2025-12-14
Show

We introduce Torch Geometric Pool (tgp), a library for hierarchical pooling in Graph Neural Networks. Built upon Pytorch Geometric, Torch Geometric Pool (tgp) provides a wide variety of pooling operators, unified under a consistent API and a modular design. The library emphasizes usability and extensibility, and includes features like precomputed pooling, which significantly accelerate training for a class of operators. In this paper, we present tgp's structure and present an extensive benchmark. The latter showcases the library's features and systematically compares the performance of the implemented graph-pooling methods in different downstream tasks. The results, showing that the choice of the optimal pooling operator depends on tasks and data at hand, support the need for a library that enables fast prototyping.

DGTEN: A Robust Deep Gaussian based Graph Neural Network for Dynamic Trust Evaluation with Uncertainty-Quantification Support 2025-12-14
Show

Dynamic trust evaluation in large, rapidly evolving graphs demands models that capture changing relationships, express calibrated confidence, and resist adversarial manipulation. DGTEN (Deep Gaussian-Based Trust Evaluation Network) introduces a unified graph-based framework that does all three by combining uncertainty-aware message passing, expressive temporal modeling, and built-in defenses against trust-targeted attacks. It represents nodes and edges as Gaussian distributions so that both semantic signals and epistemic uncertainty propagate through the graph neural network, enabling risk-aware trust decisions rather than overconfident guesses. To track how trust evolves, it layers hybrid absolute-Gaussian-hourglass positional encoding with Kolmogorov-Arnold network-based unbiased multi-head attention, then applies an ordinary differential equation-based residual learning module to jointly model abrupt shifts and smooth trends. Robust adaptive ensemble coefficient analysis prunes or down-weights suspicious interactions using complementary cosine and Jaccard similarity, curbing reputation laundering, sabotage, and on-off attacks. On two signed Bitcoin trust networks, DGTEN delivers standout gains where it matters most: in single-timeslot prediction on Bitcoin-OTC, it improves MCC by +12.34% over the best dynamic baseline; in the cold-start scenario on Bitcoin-Alpha, it achieves a +25.00% MCC improvement, the largest across all tasks and datasets; while under adversarial on-off attacks, it surpasses the baseline by up to +10.23% MCC. These results endorse the unified DGTEN framework.

15 pa...

15 pages, 6 figures, 5 tables

Empirical Mode Decomposition and Graph Transformation of the MSCI World Index: A Multiscale Topological Analysis for Graph Neural Network Modeling 2025-12-14
Show

This study applies Empirical Mode Decomposition (EMD) to the MSCI World index and converts the resulting intrinsic mode functions (IMFs) into graph representations to enable modeling with graph neural networks (GNNs). Using CEEMDAN, we extract nine IMFs spanning high-frequency fluctuations to long-term trends. Each IMF is transformed into a graph using four time-series-to-graph methods: natural visibility, horizontal visibility, recurrence, and transition graphs. Topological analysis shows clear scale-dependent structure: high-frequency IMFs yield dense, highly connected small-world graphs, whereas low-frequency IMFs produce sparser networks with longer characteristic path lengths. Visibility-based methods are more sensitive to amplitude variability and typically generate higher clustering, while recurrence graphs better preserve temporal dependencies. These results provide guidance for designing GNN architectures tailored to the structural properties of decomposed components, supporting more effective predictive modeling of financial time series.

19 pa...

19 pages, 3 figures, 6 tables

GoMS: Graph of Molecule Substructure Network for Molecule Property Prediction 2025-12-13
Show

While graph neural networks have shown remarkable success in molecular property prediction, current approaches like the Equivariant Subgraph Aggregation Networks (ESAN) treat molecules as bags of independent substructures, overlooking crucial relationships between these components. We present Graph of Molecule Substructures (GoMS), a novel architecture that explicitly models the interactions and spatial arrangements between molecular substructures. Unlike ESAN's bag-based representation, GoMS constructs a graph where nodes represent subgraphs and edges capture their structural relationships, preserving critical topological information about how substructures are connected and overlap within the molecule. Through extensive experiments on public molecular datasets, we demonstrate that GoMS outperforms ESAN and other baseline methods, with particularly improvements for large molecules containing more than 100 atoms. The performance gap widens as molecular size increases, demonstrating GoMS's effectiveness for modeling industrial-scale molecules. Our theoretical analysis demonstrates that GoMS can distinguish molecules with identical subgraph compositions but different spatial arrangements. Our approach shows particular promise for materials science applications involving complex molecules where properties emerge from the interplay between multiple functional units. By capturing substructure relationships that are lost in bag-based approaches, GoMS represents a significant advance toward scalable and interpretable molecular property prediction for real-world applications.

Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification 2025-12-13
Show

Graph Neural Networks (GNNs) have demonstrated strong performance across tasks such as node classification, link prediction, and graph classification, but remain vulnerable to backdoor attacks that implant imperceptible triggers during training to control predictions. While node-level attacks exploit local message passing, graph-level attacks face the harder challenge of manipulating global representations while maintaining stealth. We identify two main sources of anomaly in existing graph classification backdoor methods: structural deviation from rare subgraph triggers and semantic deviation caused by label flipping, both of which make poisoned graphs easily detectable by anomaly detection models. To address this, we propose DPSBA, a clean-label backdoor framework that learns in-distribution triggers via adversarial training guided by anomaly-aware discriminators. DPSBA effectively suppresses both structural and semantic anomalies, achieving high attack success while significantly improving stealth. Extensive experiments on real-world datasets validate that DPSBA achieves a superior balance between effectiveness and detectability compared to state-of-the-art baselines.

Accep...

Accepted by NeurIPS 2025

High-Dimensional Tensor Discriminant Analysis: Low-Rank Discriminant Structure, Representation Synergy, and Theoretical Guarantees 2025-12-13
Show

High-dimensional tensor-valued predictors arise in modern applications, increasingly as learned representations from neural networks. Existing tensor classification methods rely on sparsity or Tucker structures and often lack theoretical guarantees. Motivated by empirical evidence that discriminative signals concentrate along a few multilinear components, we introduce CP low-rank structure for the discriminant tensor, a modeling perspective not previously explored. Under a Tensor Gaussian Mixture Model, we propose high-dimensional CP low-rank Tensor Discriminant Analysis (CP-TDA) with Randomized Composite PCA (\textsc{rc-PCA}) initialization, that is essential for handling dependent and anisotropic noise under weaker signal strength and incoherence conditions, followed by iterative refinement algorithm. We establish global convergence and minimax-optimal misclassification rates. To handle tensor data deviating from tensor normality, we develop the first semiparametric tensor discriminant model, in which learned tensor representations are mapped via deep generative models into a latent space tailored for CP-TDA. Misclassification risk decomposes into representation, approximation, and estimation errors. Numerical studies and real data analysis on graph classification demonstrate substantial gains over existing tensor classifiers and state-of-the-art graph neural networks, particularly in high-dimensional, small-sample regimes.

Dynamic Fraud Detection: Integrating Reinforcement Learning into Graph Neural Networks 2025-12-12
Show

Financial fraud refers to the act of obtaining financial benefits through dishonest means. Such behavior not only disrupts the order of the financial market but also harms economic and social development and breeds other illegal and criminal activities. With the popularization of the internet and online payment methods, many fraudulent activities and money laundering behaviors in life have shifted from offline to online, posing a great challenge to regulatory authorities. How to efficiently detect these financial fraud activities has become an urgent issue that needs to be resolved. Graph neural networks are a type of deep learning model that can utilize the interactive relationships within graph structures, and they have been widely applied in the field of fraud detection. However, there are still some issues. First, fraudulent activities only account for a very small part of transaction transfers, leading to an inevitable problem of label imbalance in fraud detection. At the same time, fraudsters often disguise their behavior, which can have a negative impact on the final prediction results. In addition, existing research has overlooked the importance of balancing neighbor information and central node information. For example, when the central node has too many neighbors, the features of the central node itself are often neglected. Finally, fraud activities and patterns are constantly changing over time, so considering the dynamic evolution of graph edge relationships is also very important.

I req...

I request the withdrawal of this paper because serious concerns have been identified regarding the originality of the manuscript, and the authors suspect that the fifth author may have engaged in plagiarism, including the possible use of unpublished results from a manuscript affiliated with Donghua University

Generative Graph Pattern Machine 2025-12-12
Show

Graph neural networks (GNNs) have been predominantly driven by message-passing, where node representations are iteratively updated via local neighborhood aggregation. Despite their success, message-passing suffers from fundamental limitations -- including constrained expressiveness, over-smoothing, over-squashing, and limited capacity to model long-range dependencies. These issues hinder scalability: increasing data size or model size often fails to yield improved performance. To this end, we explore pathways beyond message-passing and introduce Generative Graph Pattern Machine (G$^2$PM), a generative Transformer pre-training framework for graphs. G$^2$PM represents graph instances (nodes, edges, or entire graphs) as sequences of substructures, and employs generative pre-training over the sequences to learn generalizable and transferable representations. Empirically, G$^2$PM demonstrates strong scalability: on the ogbn-arxiv benchmark, it continues to improve with model sizes up to 60M parameters, outperforming prior generative approaches that plateau at significantly smaller scales (e.g., 3M). In addition, we systematically analyze the model design space, highlighting key architectural choices that contribute to its scalability and generalization. Across diverse tasks -- including node/link/graph classification, transfer learning, and cross-graph pretraining -- G$^2$PM consistently outperforms strong baselines, establishing a compelling foundation for scalable graph learning. The code and dataset are available at https://github.com/Zehong-Wang/G2PM.

Accep...

Accepted by NeurIPS 2025

Context-Aware Agentic Power Resources Optimisation in EV using Smart2ChargeApp 2025-12-12
Show

This paper presents a novel context-sensitive multi-agent coordination for dynamic resource allocation (CAMAC-DRA) framework for optimizing smart electric vehicle (EV) charging ecosystems through the Smart2Charge application. The proposed system coordinates autonomous charging agents across networks of 250 EVs and 45 charging stations while adapting to dynamic environmental conditions through context-aware decision-making. Our multi-agent approach employs coordinated Deep Q-Networks integrated with Graph Neural Networks and attention mechanisms, processing 20 contextual features including weather patterns, traffic conditions, grid load fluctuations, and electricity pricing.The framework balances five ecosystem stakeholders i.e. EV users (25%), grid operators (20%), charging station operators (20%), fleet operators (20%), and environmental factors (15%) through weighted coordination mechanisms and consensus protocols. Comprehensive validation using real-world datasets containing 441,077 charging transactions demonstrates superior performance compared to baseline algorithms including DDPG, A3C, PPO, and GNN approaches. The CAMAC-DRA framework achieves 92% coordination success rate, 15% energy efficiency improvement, 10% cost reduction, 20% grid strain decrease, and \2.3x faster convergence while maintaining 88% training stability and 85% sample efficiency. Real-world validation confirms commercial viability with Net Present Cost of -$122,962 and 69% cost reduction through renewable energy integration. The framework's unique contribution lies in developing context-aware multi-stakeholder coordination that successfully balances competing objectives while adapting to real-time variables, positioning it as a breakthrough solution for intelligent EV charging coordination and sustainable transportation electrification.

Accelerating Sparse Matrix-Matrix Multiplication on GPUs with Processing Near HBMs 2025-12-12
Show

Sparse General Matrix-Matrix Multiplication (SpGEMM) is a fundamental operation in numerous scientific computing and data analytics applications, often bottlenecked by irregular memory access patterns. This paper presents Hash based Multi-phase SpGEMM on GPU and the Acceleration of Indirect Memory Access (AIA) technique, a novel custom near-memory processing approach to optimizing SpGEMM on GPU HBM. Our hardware-software co-designed framework for SpGEMM demonstrates significant performance improvements over state-of-the-art methods, particularly in handling complex, application-specific workloads. We evaluate our approach on various graph workloads, including graph contraction, Markov clustering, and Graph Neural Networks (GNNs), showcasing its practical applicability. For graph analytics applications, AIA demonstrates up to 17.3% time reduction from the software-only implementation, while achieving time reduction of 76.5% for Graph Contraction and 58.4% for Markov Clustering compared to cuSPARSE. For GNN training applications with structured global pruning, our hybrid approach delivers an average of 1.43x speedup over software-only implementation across six benchmark datasets and three architectures (GCN, GIN, GraphSAGE), and shows 1.95x speedup for GNN workloads when compared to cuSPARSE, with up to 4.18x gains on large-scale datasets.

13 pages, 11 figures
Exploring Spatial-Temporal Representation via Star Graph for mmWave Radar-based Human Activity Recognition 2025-12-12
Show

Human activity recognition (HAR) requires extracting accurate spatial-temporal features with human movements. A mmWave radar point cloud-based HAR system suffers from sparsity and variable-size problems due to the physical features of the mmWave signal. Existing works usually borrow the preprocessing algorithms for the vision-based systems with dense point clouds, which may not be optimal for mmWave radar systems. In this work, we proposed a graph representation with a discrete dynamic graph neural network (DDGNN) to explore the spatial-temporal representation of human movement-related features. Specifically, we designed a star graph to describe the high-dimensional relative relationship between a manually added static center point and the dynamic mmWave radar points in the same and consecutive frames. We then adopted DDGNN to learn the features residing in the star graph with variable sizes. Experimental results demonstrated that our approach outperformed other baseline methods using real-world HAR datasets. Our system achieved an overall classification accuracy of 94.27%, which gets the near-optimal performance with a vision-based skeleton data accuracy of 97.25%. We also conducted an inference test on Raspberry Pi~4 to demonstrate its effectiveness on resource-constraint platforms. \sh{ We provided a comprehensive ablation study for variable DDGNN structures to validate our model design. Our system also outperformed three recent radar-specific methods without requiring resampling or frame aggregators.

Bridging Data and Physics: A Graph Neural Network-Based Hybrid Twin Framework 2025-12-12
Show

Simulating complex unsteady physical phenomena relies on detailed mathematical models, simulated for instance by using the Finite Element Method (FEM). However, these models often exhibit discrepancies from the reality due to unmodeled effects or simplifying assumptions. We refer to this gap as the ignorance model. While purely data-driven approaches attempt to learn full system behavior, they require large amounts of high-quality data across the entire spatial and temporal domain. In real-world scenarios, such information is unavailable, making full data-driven modeling unreliable. To overcome this limitation, we model of the ignorance component using a hybrid twin approach, instead of simulating phenomena from scratch. Since physics-based models approximate the overall behavior of the phenomena, the remaining ignorance is typically lower in complexity than the full physical response, therefore, it can be learned with significantly fewer data. A key difficulty, however, is that spatial measurements are sparse, also obtaining data measuring the same phenomenon for different spatial configurations is challenging in practice. Our contribution is to overcome this limitation by using Graph Neural Networks (GNNs) to represent the ignorance model. GNNs learn the spatial pattern of the missing physics even when the number of measurement locations is limited. This allows us to enrich the physics-based model with data-driven corrections without requiring dense spatial, temporal and parametric data. To showcase the performance of the proposed method, we evaluate this GNN-based hybrid twin on nonlinear heat transfer problems across different meshes, geometries, and load positions. Results show that the GNN successfully captures the ignorance and generalizes corrections across spatial configurations, improving simulation accuracy and interpretability, while minimizing data requirements.

27 pages, 14 figures
Text2Graph: Combining Lightweight LLMs and GNNs for Efficient Text Classification in Label-Scarce Scenarios 2025-12-12
Show

Large Language Models (LLMs) have become effective zero-shot classifiers, but their high computational requirements and environmental costs limit their practicality for large-scale annotation in high-performance computing (HPC) environments. To support more sustainable workflows, we present Text2Graph, an open-source Python package that provides a modular implementation of existing text-to-graph classification approaches. The framework enables users to combine LLM-based partial annotation with Graph Neural Network (GNN) label propagation in a flexible manner, making it straightforward to swap components such as feature extractors, edge construction methods, and sampling strategies. We benchmark Text2Graph on a zero-shot setting using five datasets spanning topic classification and sentiment analysis tasks, comparing multiple variants against other zero-shot approaches for text classification. In addition to reporting performance, we provide detailed estimates of energy consumption and carbon emissions, showing that graph-based propagation achieves competitive results at a fraction of the energy and environmental cost.

Graph Embedding with Mel-spectrograms for Underwater Acoustic Target Recognition 2025-12-12
Show

Underwater acoustic target recognition (UATR) is extremely challenging due to the complexity of ship-radiated noise and the variability of ocean environments. Although deep learning (DL) approaches have achieved promising results, most existing models implicitly assume that underwater acoustic data lie in a Euclidean space. This assumption, however, is unsuitable for the inherently complex topology of underwater acoustic signals, which exhibit non-stationary, non-Gaussian, and nonlinear characteristics. To overcome this limitation, this paper proposes the UATR-GTransformer, a non-Euclidean DL model that integrates Transformer architectures with graph neural networks (GNNs). The model comprises three key components: a Mel patchify block, a GTransformer block, and a classification head. The Mel patchify block partitions the Mel-spectrogram into overlapping patches, while the GTransformer block employs a Transformer Encoder to capture mutual information between split patches to generate Mel-graph embeddings. Subsequently, a GNN enhances these embeddings by modeling local neighborhood relationships, and a feed-forward network (FFN) further performs feature transformation. Experiments results based on two widely used benchmark datasets demonstrate that the UATR-GTransformer achieves performance competitive with state-of-the-art methods. In addition, interpretability analysis reveals that the proposed model effectively extracts rich frequency-domain information, highlighting its potential for applications in ocean engineering.

CAT: Can Trust be Predicted with Context-Awareness in Dynamic Heterogeneous Networks? 2025-12-12
Show

Trust prediction provides valuable support for decision-making, risk mitigation, and system security enhancement. Recently, Graph Neural Networks (GNNs) have emerged as a promising approach for trust prediction, owing to their ability to learn expressive node representations that capture intricate trust relationships within a network. However, current GNN-based trust prediction models face several limitations: (i) Most of them fail to capture trust dynamicity, leading to questionable inferences. (ii) They rarely consider the heterogeneous nature of real-world networks, resulting in a loss of rich semantics. (iii) None of them support context-awareness, a basic property of trust, making prediction results coarse-grained. To this end, we propose CAT, the first Context-Aware GNN-based Trust prediction model that supports trust dynamicity and accurately represents real-world heterogeneity. CAT consists of a graph construction layer, an embedding layer, a heterogeneous attention layer, and a prediction layer. It handles dynamic graphs using continuous-time representations and captures temporal information through a time encoding function. To model graph heterogeneity and leverage semantic information, CAT employs a dual attention mechanism that identifies the importance of different node types and nodes within each type. For context-awareness, we introduce a new notion of meta-paths to extract contextual features. By constructing context embeddings and integrating a context-aware aggregator, CAT can predict both context-aware trust and overall trust. Extensive experiments on three real-world datasets demonstrate that CAT outperforms five groups of baselines in trust prediction, while exhibiting strong scalability to large-scale graphs and robustness against both trust-oriented and GNN-oriented attacks.

Condensation-Concatenation Framework for Dynamic Graph Continual Learning 2025-12-12
Show

Dynamic graphs are prevalent in real-world scenarios, where continuous structural changes induce catastrophic forgetting in graph neural networks (GNNs). While continual learning has been extended to dynamic graphs, existing methods overlook the effects of topological changes on existing nodes. To address it, we propose a novel framework for continual learning on dynamic graphs, named Condensation-Concatenation-based Continual Learning (CCC). Specifically, CCC first condenses historical graph snapshots into compact semantic representations while aiming to preserve the original label distribution and topological properties. Then it concatenates these historical embeddings with current graph representations selectively. Moreover, we refine the forgetting measure (FM) to better adapt to dynamic graph scenarios by quantifying the predictive performance degradation of existing nodes caused by structural updates. CCC demonstrates superior performance over state-of-the-art baselines across four real-world datasets in extensive experiments.

AGAPI-Agents: An Open-Access Agentic AI Platform for Accelerated Materials Design on AtomGPT.org 2025-12-12
Show

Artificial intelligence is reshaping scientific discovery, yet its use in materials research remains limited by fragmented computational ecosystems, reproducibility challenges, and dependence on commercial large language models (LLMs). Here we introduce AGAPI (AtomGPT.org API), an open-access agentic AI platform that integrates more than eight open-source LLMs with over twenty materials-science API endpoints, unifying databases, simulation tools, and machine-learning models through a common orchestration framework. AGAPI employs an Agent-Planner-Executor-Summarizer architecture that autonomously constructs and executes multi-step workflows spanning materials data retrieval, graph neural network property prediction, machine-learning force-field optimization, tight-binding calculations, diffraction analysis, and inverse design. We demonstrate AGAPI through end-to-end workflows, including heterostructure construction, powder X-ray diffraction analysis, and semiconductor defect engineering requiring up to ten sequential operations. In addition, we evaluate AGAPI using 30+ example prompts as test cases and compare agentic predictions with and without tool access against experimental data. With more than 1,000 active users, AGAPI provides a scalable and transparent foundation for reproducible, AI-accelerated materials discovery. AGAPI-Agents codebase is available at https://github.com/atomgptlab/agapi.

BugSweeper: Function-Level Detection of Smart Contract Vulnerabilities Using Graph Neural Networks 2025-12-12
Show

The rapid growth of Ethereum has made it more important to quickly and accurately detect smart contract vulnerabilities. While machine-learning-based methods have shown some promise, many still rely on rule-based preprocessing designed by domain experts. Rule-based preprocessing methods often discard crucial context from the source code, potentially causing certain vulnerabilities to be overlooked and limiting adaptability to newly emerging threats. We introduce BugSweeper, an end-to-end deep learning framework that detects vulnerabilities directly from the source code without manual engineering. BugSweeper represents each Solidity function as a Function-Level Abstract Syntax Graph (FLAG), a novel graph that combines its Abstract Syntax Tree (AST) with enriched control-flow and data-flow semantics. Then, our two-stage Graph Neural Network (GNN) analyzes these graphs. The first-stage GNN filters noise from the syntax graphs, while the second-stage GNN conducts high-level reasoning to detect diverse vulnerabilities. Extensive experiments on real-world contracts show that BugSweeper significantly outperforms all state-of-the-art detection methods. By removing the need for handcrafted rules, our approach offers a robust, automated, and scalable solution for securing smart contracts without any dependence on security experts.

This ...

This paper is accepted to AAAI 2026

Personalized Pricing in Social Networks with Individual and Group Fairness Considerations 2025-12-12
Show

Personalized pricing assigns different prices to customers for the same product based on customer-specific features to improve retailer revenue. However, this practice often raises concerns about fairness at both the individual and group levels. At the individual level, a customer may perceive unfair treatment if he/she notices being charged a higher price than others. At the group level, pricing disparities can result in discrimination against certain protected groups, such as those defined by gender or race. Existing studies on fair pricing typically address individual and group fairness separately. This paper bridges the gap by introducing a new formulation of the personalized pricing problem that incorporates both dimensions of fairness in social network settings. To solve the problem, we propose FairPricing, a novel framework based on graph neural networks (GNNs) that learns a personalized pricing policy using customer features and network topology. In FairPricing, individual perceived unfairness is captured through a penalty on customer demand, and thus the profit objective, while group-level discrimination is mitigated using adversarial debiasing and a price regularization term. Unlike existing optimization-based personalized pricing, which requires re-optimization whenever the network updates, the pricing policy learned by FairPricing assigns personalized prices to all customers in an updated network based on their features and the new network structure, thereby generalizing to network changes. Extensive experimental results show that FairPricing achieves high profitability while improving individual fairness perceptions and satisfying group fairness requirements.

GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism 2025-12-12
Show

Graph neural networks (GNNs), an emerging class of machine learning models for graphs, have gained popularity for their superior performance in various graph analytical tasks. Mini-batch training is commonly used to train GNNs on large graphs, and data parallelism is the standard approach to scale mini-batch training across multiple GPUs. Data parallel approaches contain redundant work as subgraphs sampled by different GPUs contain significant overlap. To address this issue, we introduce a hybrid parallel mini-batch training paradigm called split parallelism. Split parallelism avoids redundant work by splitting the sampling, loading, and training of each mini-batch across multiple GPUs. Split parallelism, however, introduces communication overheads that can be more than the savings from removing redundant work. We further present a lightweight partitioning algorithm that probabilistically minimizes these overheads. We implement split parallelism in GSplit and show that it outperforms state-of-the-art mini-batch training systems like DGL, Quiver, and $P^3$.

Publi...

Published at MLSys 2025. OpenReview: https://openreview.net/forum?id=cTOx1YTBgh

Refining Graphical Neural Network Predictions Using Flow Matching for Optimal Power Flow with Constraint-Satisfaction Guarantee 2025-12-11
Show

The DC Optimal Power Flow (DC-OPF) problem is fundamental to power system operations, requiring rapid solutions for real-time grid management. While traditional optimization solvers provide optimal solutions, their computational cost becomes prohibitive for large-scale systems requiring frequent recalculations. Machine learning approaches offer promise for acceleration but often struggle with constraint satisfaction and cost optimality. We present a novel two-stage learning framework that combines physics-informed Graph Neural Networks (GNNs) with Continuous Flow Matching (CFM) for solving DC-OPF problems. Our approach embeds fundamental physical principles--including economic dispatch optimality conditions, Kirchhoff's laws, and Karush-Kuhn-Tucker (KKT) complementarity conditions--directly into the training objectives. The first stage trains a GNN to produce feasible initial solutions by learning from physics-informed losses that encode power system constraints. The second stage employs CFM, a simulation-free continuous normalizing flow technique, to refine these solutions toward optimality through learned vector field regression. Evaluated on the IEEE 30-bus system across five load scenarios ranging from 70% to 130% nominal load, our method achieves near-optimal solutions with cost gaps below 0.1% for nominal loads and below 3% for extreme conditions, while maintaining 100% feasibility. Our framework bridges the gap between fast but approximate neural network predictions and optimal but slow numerical solvers, offering a practical solution for modern power systems with high renewable penetration requiring frequent dispatch updates.

Empowering GNNs for Domain Adaptation via Denoising Target Graph 2025-12-11
Show

We explore the node classification task in the context of graph domain adaptation, which uses both source and target graph structures along with source labels to enhance the generalization capabilities of Graph Neural Networks (GNNs) on target graphs. Structure domain shifts frequently occur, especially when graph data are collected at different times or from varying areas, resulting in poor performance of GNNs on target graphs. Surprisingly, we find that simply incorporating an auxiliary loss function for denoising graph edges on target graphs can be extremely effective in enhancing GNN performance on target graphs. Based on this insight, we propose our framework, GraphDeT, a framework that integrates this auxiliary edge task into GNN training for node classification under domain adaptation. Our theoretical analysis connects this auxiliary edge task to the graph generalization bound with -distance, demonstrating such auxiliary task can imposes a constraint which tightens the bound and thereby improves generalization. The experimental results demonstrate superior performance compared to the existing baselines in handling both time and regional domain graph shifts.

LGAN: An Efficient High-Order Graph Neural Network via the Line Graph Aggregation 2025-12-11
Show

Graph Neural Networks (GNNs) have emerged as a dominant paradigm for graph classification. Specifically, most existing GNNs mainly rely on the message passing strategy between neighbor nodes, where the expressivity is limited by the 1-dimensional Weisfeiler-Lehman (1-WL) test. Although a number of k-WL-based GNNs have been proposed to overcome this limitation, their computational cost increases rapidly with k, significantly restricting the practical applicability. Moreover, since the k-WL models mainly operate on node tuples, these k-WL-based GNNs cannot retain fine-grained node- or edge-level semantics required by attribution methods (e.g., Integrated Gradients), leading to the less interpretable problem. To overcome the above shortcomings, in this paper, we propose a novel Line Graph Aggregation Network (LGAN), that constructs a line graph from the induced subgraph centered at each node to perform the higher-order aggregation. We theoretically prove that the LGAN not only possesses the greater expressive power than the 2-WL under injective aggregation assumptions, but also has lower time complexity. Empirical evaluations on benchmarks demonstrate that the LGAN outperforms state-of-the-art k-WL-based GNNs, while offering better interpretability.

Formula size game and model checking for modal substitution calculus 2025-12-11
Show

Recent research has applied modal substitution calculus (MSC) and its variants to characterize various computational frameworks such as graph neural networks (GNNs) and distributed computing systems. For example, it has been shown that the expressive power of recurrent graph neural networks coincides with graded modal substitution calculus GMSC, which is the extension of MSC with counting modalities. GMSC can be further extended with the counting global modality, resulting in the logic GGMSC which corresponds to GNNs with global readout mechanisms. In this paper we introduce a formula-size game that characterizes the expressive power of MSC, GMSC, GGMSC, and related logics. Furthermore, we study the expressiveness and model checking of logics in this family. We prove that MSC and its extensions (GMSC, GGMSC) are as expressive as linear tape-bounded Turing machines, while asynchronous variants are linked to modal mu-calculus and modal computation logic MCL. We establish that for MSC, GMSC and GGMSC, both combined and data complexity of model checking are PSPACE-complete, and for their asynchronous variants, both complexities are PTIME-complete. We also establish that for the propositional fragment SC of MSC, the combined complexity of model checking is PSPACE-complete, while for asynchronous SC it is PTIME-complete, and in both cases, data complexity is constant. As a corollary, we observe that SC satisfiability is PSPACE-complete and NP-complete for its asynchronous variant. Finally, we construct a universal reduction from all recursively enumerable problems to MSC model checking.

THeGAU: Type-Aware Heterogeneous Graph Autoencoder and Augmentation 2025-12-11
Show

Heterogeneous Graph Neural Networks (HGNNs) are effective for modeling Heterogeneous Information Networks (HINs), which encode complex multi-typed entities and relations. However, HGNNs often suffer from type information loss and structural noise, limiting their representational fidelity and generalization. We propose THeGAU, a model-agnostic framework that combines a type-aware graph autoencoder with guided graph augmentation to improve node classification. THeGAU reconstructs schema-valid edges as an auxiliary task to preserve node-type semantics and introduces a decoder-driven augmentation mechanism to selectively refine noisy structures. This joint design enhances robustness, accuracy, and efficiency while significantly reducing computational overhead. Extensive experiments on three benchmark HIN datasets (IMDB, ACM, and DBLP) demonstrate that THeGAU consistently outperforms existing HGNN methods, achieving state-of-the-art performance across multiple backbones.

Mr. Virgil: Learning Multi-robot Visual-range Relative Localization 2025-12-11
Show

Ultra-wideband (UWB)-vision fusion localization has achieved extensive applications in the domain of multi-agent relative localization. The challenging matching problem between robots and visual detection renders existing methods highly dependent on identity-encoded hardware or delicate tuning algorithms. Overconfident yet erroneous matches may bring about irreversible damage to the localization system. To address this issue, we introduce Mr. Virgil, an end-to-end learning multi-robot visual-range relative localization framework, consisting of a graph neural network for data association between UWB rangings and visual detections, and a differentiable pose graph optimization (PGO) back-end. The graph-based front-end supplies robust matching results, accurate initial position predictions, and credible uncertainty estimates, which are subsequently integrated into the PGO back-end to elevate the accuracy of the final pose estimation. Additionally, a decentralized system is implemented for real-world applications. Experiments spanning varying robot numbers, simulation and real-world, occlusion and non-occlusion conditions showcase the stability and exactitude under various scenes compared to conventional methods. Our code is available at: https://github.com/HiOnes/Mr-Virgil.

Accep...

Accepted by 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

From Lab to Reality: A Practical Evaluation of Deep Learning Models and LLMs for Vulnerability Detection 2025-12-11
Show

Vulnerability detection methods based on deep learning (DL) have shown strong performance on benchmark datasets, yet their real-world effectiveness remains underexplored. Recent work suggests that both graph neural network (GNN)-based and transformer-based models, including large language models (LLMs), yield promising results when evaluated on curated benchmark datasets. These datasets are typically characterized by consistent data distributions and heuristic or partially noisy labels. In this study, we systematically evaluate two representative DL models-ReVeal and LineVul-across four representative datasets: Juliet, Devign, BigVul, and ICVul. Each model is trained independently on each respective dataset, and their code representations are analyzed using t-SNE to uncover vulnerability related patterns. To assess realistic applicability, we deploy these models along with four pretrained LLMs, Claude 3.5 Sonnet, GPT-o3-mini, GPT-4o, and GPT-5 on a curated dataset, VentiVul, comprising 20 recently (May 2025) fixed vulnerabilities from the Linux kernel. Our experiments reveal that current models struggle to distinguish vulnerable from non-vulnerable code in representation space and generalize poorly across datasets with differing distributions. When evaluated on VentiVul, our newly constructed time-wise out-of-distribution dataset, performance drops sharply, with most models failing to detect vulnerabilities reliably. These results expose a persistent gap between academic benchmarks and real-world deployment, emphasizing the value of our deployment-oriented evaluation framework and the need for more robust code representations and higher-quality datasets.

DDFI: Diverse and Distribution-aware Missing Feature Imputation via Two-step Reconstruction 2025-12-11
Show

Incomplete node features are ubiquitous in real-world scenarios, e.g., the attributes of web users may be partly private, which causes the performance of Graph Neural Networks (GNNs) to decline significantly. Feature propagation (FP) is a well-known method that performs well for imputation of missing node features on graphs, but it still has the following three issues: 1) it struggles with graphs that are not fully connected, 2) imputed features face the over-smoothing problem, and 3) FP is tailored for transductive tasks, overlooking the feature distribution shift in inductive tasks. To address these challenges, we introduce DDFI, a Diverse and Distribution-aware Missing Feature Imputation method that combines feature propagation with a graph-based Masked AutoEncoder (MAE) in a nontrivial manner. It first designs a simple yet effective algorithm, namely Co-Label Linking (CLL), that randomly connects nodes in the training set with the same label to enhance the performance on graphs with numerous connected components. Then we develop a novel two-step representation generation process at the inference stage. Specifically, instead of directly using FP-imputed features as input during inference, DDFI further reconstructs the features through the whole MAE to reduce feature distribution shift in the inductive tasks and enhance the diversity of node features. Meanwhile, since existing feature imputation methods for graphs only evaluate by simulating the missing scenes with manually masking the features, we collect a new dataset called Sailing from the records of voyages that contains naturally missing features to help better evaluate the effectiveness. Extensive experiments conducted on six public datasets and Sailing show that DDFI outperforms the state-of-the-art methods under both transductive and inductive settings.

Enhanced Spatial Clustering of Single-Molecule Localizations with Graph Neural Networks 2025-12-11
Show

Single-molecule localization microscopy generates point clouds corresponding to fluorophore localizations. Spatial cluster identification and analysis of these point clouds are crucial for extracting insights about molecular organization. However, this task becomes challenging in the presence of localization noise, high point density, or complex biological structures. Here, we introduce MIRO (Multifunctional Integration through Relational Optimization), an algorithm that uses recurrent graph neural networks to transform the point clouds in order to improve clustering efficiency when applying conventional clustering techniques. We show that MIRO supports simultaneous processing of clusters of different shapes and at multiple scales, demonstrating improved performance across varied datasets. Our comprehensive evaluation demonstrates MIRO's transformative potential for single-molecule localization applications, showcasing its capability to revolutionize cluster analysis and provide accurate, reliable details of molecular architecture. In addition, MIRO's robust clustering capabilities hold promise for applications in various fields such as neuroscience, for the analysis of neural connectivity patterns, and environmental science, for studying spatial distributions of ecological data.

47 pa...

47 pages, 5 main figures, 3 table, 3 supplementary figures, 9 supplementary tables. This is the author's version of the article published in Nature Communications under CC BY 4.0. The final published version is available at https://doi.org/10.1038/s41467-025-65557-7

Better Prevent than Tackle: Valuing Defense in Soccer Based on Graph Neural Networks 2025-12-11
Show

Evaluating defensive performance in soccer remains challenging, as effective defending is often expressed not through visible on-ball actions such as interceptions and tackles, but through preventing dangerous opportunities before they arise. Existing approaches have largely focused on valuing on-ball actions, leaving much of defenders' true impact unmeasured. To address this gap, we propose DEFCON (DEFensive CONtribution evaluator), a comprehensive framework that quantifies player-level defensive contributions for every attacking situation in soccer. Leveraging Graph Attention Networks, DEFCON estimates the success probability and expected value of each attacking option, along with each defender's responsibility for stopping it. These components yield an Expected Possession Value (EPV) for the attacking team before and after each action, and DEFCON assigns positive or negative credits to defenders according to whether they reduced or increased the opponent's EPV. Trained on 2023-24 and evaluated on 2024-25 Eredivisie event and tracking data, DEFCON's aggregated player credits exhibit strong positive correlations with market valuations. Finally, we showcase several practical applications, including in-game timelines of defensive contributions, spatial analyses across pitch zones, and pairwise summaries of attacker-defender interactions.

A Kernel-based Resource-efficient Neural Surrogate for Multi-fidelity Prediction of Aerodynamic Field 2025-12-11
Show

Surrogate models provide fast alternatives to costly aerodynamic simulations and are extremely useful in design and optimization applications. This study proposes the use of a recent kernel-based neural surrogate, KHRONOS. In this work, we blend sparse high-fidelity (HF) data with low-fidelity (LF) information to predict aerodynamic fields under varying constraints in computational resources. Unlike traditional approaches, KHRONOS is built upon variational principles, interpolation theory, and tensor decomposition. These elements provide a mathematical basis for heavy pruning compared to dense neural networks. Using the AirfRANS dataset as a high-fidelity benchmark and NeuralFoil to generate low-fidelity counterparts, this work compares the performance of KHRONOS with three contemporary model architectures: a multilayer perceptron (MLP), a graph neural network (GNN), and a physics-informed neural network (PINN). We consider varying levels of high-fidelity data availability (0%, 10%, and 30%) and increasingly complex geometry parameterizations. These are used to predict the surface pressure coefficient distribution over the airfoil. Results indicate that, whilst all models eventually achieve comparable predictive accuracy, KHRONOS excels in resource-constrained conditions. In this domain, KHRONOS consistently requires orders of magnitude fewer trainable parameters and delivers much faster training and inference than contemporary dense neural networks at comparable accuracy. These findings highlight the potential of KHRONOS and similar architectures to balance accuracy and efficiency in multi-fidelity aerodynamic field prediction.

24 pages, 15 figures
Graph Neural Network Based Adaptive Threat Detection for Cloud Identity and Access Management Logs 2025-12-11
Show

The rapid expansion of cloud infrastructures and distributed identity systems has significantly increased the complexity and attack surface of modern enterprises. Traditional rule based or signature driven detection systems are often inadequate in identifying novel or evolving threats within Identity and Access Management logs, where anomalous behavior may appear statistically benign but contextually malicious. This paper presents a Graph Neural Network Based Adaptive Threat Detection framework designed to learn latent user resource interaction patterns from IAM audit trails in real time. By modeling IAM logs as heterogeneous dynamic graphs, the proposed system captures temporal, relational, and contextual dependencies across entities such as users, roles, sessions, and access actions. The model incorporates attention based aggregation and graph embedding updates to enable continual adaptation to changing cloud environments. Experimental evaluation on synthesized and real world IAM datasets demonstrates that the proposed method achieves higher detection precision and recall than baseline LSTM and GCN classifiers, while maintaining scalability across multi tenant cloud environments. The frameworks adaptability enables proactive mitigation of insider threats, privilege escalation, and lateral movement attacks, contributing to the foundation of AI driven zero trust access analytics. This work bridges the gap between graph based machine learning and operational cloud security intelligence.

GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation 2025-12-11
Show

Retrieval-augmented generation (RAG) has proven effective in integrating knowledge into large language models (LLMs). However, conventional RAGs struggle to capture complex relationships between pieces of knowledge, limiting their performance in intricate reasoning that requires integrating knowledge from multiple sources. Recently, graph-enhanced retrieval augmented generation (GraphRAG) builds graph structure to explicitly model these relationships, enabling more effective and efficient retrievers. Nevertheless, its performance is still hindered by the noise and incompleteness within the graph structure. To address this, we introduce GFM-RAG, a novel graph foundation model (GFM) for retrieval augmented generation. GFM-RAG is powered by an innovative graph neural network that reasons over graph structure to capture complex query-knowledge relationships. The GFM with 8M parameters undergoes a two-stage training process on large-scale datasets, comprising 60 knowledge graphs with over 14M triples and 700k documents. This results in impressive performance and generalizability for GFM-RAG, making it the first graph foundation model applicable to unseen datasets for retrieval without any fine-tuning required. Extensive experiments on three multi-hop QA datasets and seven domain-specific RAG datasets demonstrate that GFM-RAG achieves state-of-the-art performance while maintaining efficiency and alignment with neural scaling laws, highlighting its potential for further improvement.

Accep...

Accepted by NeurIPS 2025

Efficient Exact Subgraph Matching via GNN-based Path Dominance Embedding (Technical Report) 2025-12-11
Show

The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. To further optimize our GNN-PE approach, we also propose a more efficient GNN-based path group embedding (GNN-PGE) technique, which performs subgraph matching over grouped path embedding vectors. We design effective pruning strategies (w.r.t. grouped path embeddings) that can significantly reduce the search space during the index traversal. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE and GNN-PGE approaches for exact subgraph matching on both real and synthetic graph data.

Magnitude-Modulated Equivariant Adapter for Parameter-Efficient Fine-Tuning of Equivariant Graph Neural Networks 2025-12-11
Show

Pretrained equivariant graph neural networks based on spherical harmonics offer efficient and accurate alternatives to computationally expensive ab-initio methods, yet adapting them to new tasks and chemical environments still requires fine-tuning. Conventional parameter-efficient fine-tuning (PEFT) techniques, such as Adapters and LoRA, typically break symmetry, making them incompatible with those equivariant architectures. ELoRA, recently proposed, is the first equivariant PEFT method. It achieves improved parameter efficiency and performance on many benchmarks. However, the relatively high degrees of freedom it retains within each tensor order can still perturb pretrained feature distributions and ultimately degrade performance. To address this, we present Magnitude-Modulated Equivariant Adapter (MMEA), a novel equivariant fine-tuning method which employs lightweight scalar gating to modulate feature magnitudes on a per-order and per-multiplicity basis. We demonstrate that MMEA preserves strict equivariance and, across multiple benchmarks, consistently improves energy and force predictions to state-of-the-art levels while training fewer parameters than competing approaches. These results suggest that, in many practical scenarios, modulating channel magnitudes is sufficient to adapt equivariant models to new chemical environments without breaking symmetry, pointing toward a new paradigm for equivariant PEFT design.

Accep...

Accepted at AAAI 2026 (Poster)

Galaxy Phase-Space and Field-Level Cosmology: The Strength of Semi-Analytic Models 2025-12-11
Show

Semi-analytic models are a widely used approach to simulate galaxy properties within a cosmological framework, relying on simplified yet physically motivated prescriptions. They have also proven to be an efficient alternative for generating accurate galaxy catalogs, offering a faster and less computationally expensive option compared to full hydrodynamical simulations. In this paper, we demonstrate that using only galaxy $3$D positions and radial velocities, we can train a graph neural network coupled to a moment neural network to obtain a robust machine learning based model capable of estimating the matter density parameters, $Ω_{\rm m}$, with a precision of approximately 10%. The network is trained on ($25 h^{-1}$Mpc)$^3$ volumes of galaxy catalogs from L-Galaxies and can successfully extrapolate its predictions to other semi-analytic models (GAEA, SC-SAM, and Shark) and, more remarkably, to hydrodynamical simulations (Astrid, SIMBA, IllustrisTNG, and SWIFT-EAGLE). Our results show that the network is robust to variations in astrophysical and subgrid physics, cosmological and astrophysical parameters, and the different halo-profile treatments used across simulations. This suggests that the physical relationships encoded in the phase-space of semi-analytic models are largely independent of their specific physical prescriptions, reinforcing their potential as tools for the generation of realistic mock catalogs for cosmological parameter inference.

23 pages, 5 figures
Modeling Narrative Archetypes in Conspiratorial Narratives: Insights from Singapore-Based Telegram Groups 2025-12-10
Show

Conspiratorial discourse is increasingly embedded within digital communication ecosystems, yet its structure and spread remain difficult to study. This work analyzes conspiratorial narratives in Singapore-based Telegram groups, showing that such content is woven into everyday discussions rather than confined to isolated echo chambers. We propose a two-stage computational framework. First, we fine-tune RoBERTa-large to classify messages as conspiratorial or not, achieving an F1-score of 0.866 on 2,000 expert-labeled messages. Second, we build a signed belief graph in which nodes represent messages and edge signs reflect alignment in belief labels, weighted by textual similarity. We introduce a Signed Belief Graph Neural Network (SiBeGNN) that uses a Sign Disentanglement Loss to learn embeddings that separate ideological alignment from stylistic features. Using hierarchical clustering on these embeddings, we identify seven narrative archetypes across 553,648 messages: legal topics, medical concerns, media discussions, finance, contradictions in authority, group moderation, and general chat. SiBeGNN yields stronger clustering quality (cDBI = 8.38) than baseline methods (13.60 to 67.27), supported by 88 percent inter-rater agreement in expert evaluations. Our analysis shows that conspiratorial messages appear not only in clusters focused on skepticism or distrust, but also within routine discussions of finance, law, and everyday matters. These findings challenge common assumptions about online radicalization by demonstrating that conspiratorial discourse operates within ordinary social interaction. The proposed framework advances computational methods for belief-driven discourse analysis and offers applications for stance detection, political communication studies, and content moderation policy.

M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks 2025-12-10
Show

The rise of 5G/6G network technologies promises to enable applications like autonomous vehicles and virtual reality, resulting in a significant increase in connected devices and necessarily complicating network management. Even worse, these applications often have strict, yet heterogeneous, performance requirements across metrics like latency and reliability. Much recent work has thus focused on developing the ability to predict network performance. However, traditional methods for network modeling, like discrete event simulators and emulation, often fail to balance accuracy and scalability. Network Digital Twins (NDTs), augmented by machine learning, present a viable solution by creating virtual replicas of physical networks for real- time simulation and analysis. State-of-the-art models, however, fall short of full-fledged NDTs, as they often focus only on a single performance metric or simulated network data. We introduce M3Net, a Multi-Metric Mixture-of-experts (MoE) NDT that uses a graph neural network architecture to estimate multiple performance metrics from an expanded set of network state data in a range of scenarios. We show that M3Net significantly enhances the accuracy of flow delay predictions by reducing the MAPE (Mean Absolute Percentage Error) from 20.06% to 17.39%, while also achieving 66.47% and 78.7% accuracy on jitter and packets dropped for each flow

A Network Science Approach to Granular Time Series Segmentation 2025-12-10
Show

Time series segmentation (TSS) is one of the time series (TS) analysis techniques, that has received considerably less attention compared to other TS related tasks. In recent years, deep learning architectures have been introduced for TSS, however their reliance on sliding windows limits segmentation granularity due to fixed window sizes and strides. To overcome these challenges, we propose a new more granular TSS approach that utilizes the Weighted Dual Perspective Visbility Graph (WDPVG) TS into a graph and combines it with a Graph Attention Network (GAT). By transforming TS into graphs, we are able to capture different structural aspects of the data that would otherwise remain hidden. By utilizing the representation learning capabilities of Graph Neural Networks, our method is able to effectively identify meaningful segments within the TS. To better understand the potential of our approach, we also experimented with different TS-to-graph transformations and compared their performance. Our contributions include: a) formulating the TSS as a node classification problem on graphs; b) conducting an extensive analysis of various TS-to-graph transformations applied to TSS using benchmark datasets from the TSSB repository; c) providing the first detailed study on utilizing GNNs for analyzing graph representations of TS in the context of TSS; d) demonstrating the effectiveness of our method, which achieves an average F1 score of 0.97 across 59 diverse TSS benchmark datasets; e) outperforming the seq2point baseline method by 0.05 in terms of F1 score; and f) reducing the required training data compared to the baseline methods.

20 pages, 11 figures

About

Daily ArXiv Papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages