skip to main content
research-article
Open access

Score-based Graph Learning for Urban Flow Prediction

Published: 17 May 2024 Publication History

Abstract

Accurate urban flow prediction (UFP) is crucial for a range of smart city applications such as traffic management, urban planning, and risk assessment. To capture the intrinsic characteristics of urban flow, recent efforts have utilized spatial and temporal graph neural networks to deal with the complex dependence between the traffic in adjacent areas. However, existing graph neural network based approaches suffer from several critical drawbacks, including improper graph representation of urban traffic data, lack of semantic correlation modeling among graph nodes, and coarse-grained exploitation of external factors. To address these issues, we propose DiffUFP, a novel probabilistic graph-based framework for UFP. DiffUFP consists of two key designs: (1) a semantic region dynamic extraction method that effectively captures the underlying traffic network topology, and (2) a conditional denoising score-based adjacency matrix generator that takes spatial, temporal, and external factors into account when constructing the adjacency matrix rather than simply concatenation in existing studies. Extensive experiments conducted on real-world datasets demonstrate the superiority of DiffUFP over the state-of-the-art UFP models and the effect of the two specific modules.

1 Introduction

With the advancement of mobile computing technology such as GPS, cellular, and mobile apps, it has become more and more convenient to collect spatial and temporal data such as human mobility, trajectories of vehicles, and shared bicycles [8, 67], which, in turn, enable the analysis of city traffic and flow movement that play crucial roles in various applications, ranging from intelligent transportation systems and urban planning to public safety (e.g., epidemic spread analysis) and smart city management [66]. As an important and challenging problem, urban flow prediction (UFP) has received increasing attention and has been extensively studied in recent years [21, 23, 26, 27, 34, 42, 59, 61].
Because of the inherent spatio-temporal property of urban flow, early approaches [29, 60] use convolutional neural networks (CNNs) to learn the spatial correlation, whereas recurrent neural networks (RNNs) such as LSTM and GRUs are typically used to capture the dynamic temporal correlation. Later, researchers began to exploit various advanced deep learning models like residual networks [26, 59] and meta-learning [36] to better capture the spatio-temporal correlations between different areas. Graph neural networks (GNNs) have recently demonstrated their ability to effectively represent graph-structured data and have been widely adopted in spatio-temporal learning [11, 12, 53]. As a result, a number of GNN-based methods for urban traffic analysis and flow prediction have been proposed [11, 56, 63]. These methods, in general, formulate UFP as a Spatio-Temporal Graph (STG) prediction problem and have achieved promising results by combining GNNs with some temporal dynamic modeling techniques [36, 61, 62].
Nonetheless, accurate UFP is difficult due to the complex spatio-temporal interactions. Existing graph-based UFP methods still have three flaws that prevent them from addressing the UFP issue effectively.
The first flaw is semantically insufficient or inflexible construction of the urban flow graph. Existing GNN-based UFP models [30, 56, 63] partition the map into grids evenly and take the equal-sized regions as nodes to construct the urban flow graph, as illustrated in Figure 1. They model the inflow and outflow of a node by aggregating the information (i.e., flow) of its immediate neighbor (1-hop) nodes, which allows the GNN to easily handle the input flow data and forecast the output urban flow. However, this general processing method fails to consider the real-world road networks and semantic regions (e.g., schools, parks, and residential areas). For example, several grids may divide a semantic region, and a trajectory may lead to multi-hop message passing in GNNs and therefore inaccurate predictions. Besides, grid partition typically leads to a large-scale graph that requires more complex GNNs to capture the underlying interactions between nodes and costs more intensive computations.
Fig. 1.
Fig. 1. Illustration of map partition. A semantic zone (e.g., Temple of Heaven in Beijing) spans several grids. Therefore, a simple user trajectory requires multi-hop message passing in GNNs.
Researchers have suggested region-based urban flow graphs to conduct relational inference on a higher semantic level to address the shortcomings of grid-based urban flow graphs. For example, a recent study [24] uses Mincut theory to cluster semantically similar regions, which is also semantically inadequate for creating the STG. Sun et al. [42] consider irregular regions partitioned according to real-world road networks as nodes to construct the STG. Nevertheless, it is inflexible for general STG construction, as it requires additional information regarding the road networks, which vary from city to city and are dynamically changed.
The second flaw is coarse-grained and manually crafted temporal features. The time attribute is another important dimension of urban traffic and a key factor in accurately predicting urban traffic. Examining the temporal dynamics allows us to learn the trends and periodicity of urban flow. Previous methods usually follow the pioneering work [59] that intercepts three different lengths of interval time series segments from the meta traffic data to reflect three temporal correlations: closeness, periodicity, and trend. For example, ST-GDN [62] uses multi-scale self-attention to explore the multi-level temporal contextual information and to capture the temporal hierarchy of traffic flow regularities. However, existing methods need to manually define the temporal resolutions in advance—the temporal property of urban traffic is too complicated to determine all temporal resolutions explicitly. Besides, temporal properties at different levels can influence each other. For example, the daily peeks of inflows and outflows in outdoor stadiums differ in winter and summer, and the seasonal temporal property influences the daily temporal property. Therefore, it is necessary to capture the temporal dynamics in a flexible way rather than manually crafted and coarse-grained time feature engineering in existing studies.
The third flaw is improper external factors usage. External factors such as weather, weekends, and holidays are important indicators of urban traffic flow [24, 61]. These factors, in essence, affect people’s travel patterns and hence significantly impact flow prediction. For example, flows in residential areas between entertainment regions will drastically reduce in poor weather. However, most previous solutions incorporate external factors straightforwardly. For instance, some approaches [26, 42, 59, 60] simply fuse the external factors with the flow data and then feed them into fully connected neural networks for prediction. The correlations between outside factors and urban flow transitions are disregarded in this simplistic utilization. To study the effects of external influences and the function of the land at the same time, STRN [24] proposes a meta learner that employs matrix factorization to identify relationships between regions and external factors. This approach, however, is constrained by fixed region partition and neglects to take into account varied combinations of outside influences.
To address the preceding challenges, we propose DiffUFP, a novel graph-based framework for UFP. First, we introduce a dynamic semantic region extractor to learn the underlying traffic structure from the Euclidean grid-like urban flow map. It excavates the key semantic regions dynamically and models the interactions between different semantic regions. Through the dynamic nodes representations, we can empower the message passing between nodes and learn the temporal property of urban traffic smoothly. Besides, we design a spatio-temporal adjacency matrix generator to understand the transitions of urban flows under diversified external factors. Specifically, DiffUFP constructs a semantic adjacency matrix to find the similarities of flow data and capture temporal semantic connections. To handle various external factors, we use a denoising score-based model to generate an adjacency matrix conditioned on external factors. After adding a spatial adjacency matrix, we can get the final spatio-temporal adjacency matrix. With the nodes and edge representations processed by these two methods, our graph-based model can effectively deal with the UFP problem.
The contributions of this work are fourfold:
We propose a novel probabilistic GNN-based method for UFP, which constructs the STG to model the flow of urban crowds flexibly and reasonably.
We design a dynamic graph-based node representation method to sufficiently capture the temporal dynamics. Our approach extracts the critical semantic regions from the Euclidean grid-like urban flow map by analyzing the spatio-temporal evolution of urban flow data.
We introduce a conditional denoising score-based method to generate the adjacency matrix from latent semantic space, allowing the model to capture complex spatial and temporal transitions between different regions by taking full advantage of external factors.
We conduct extensive experimental evaluations on real-world datasets collected from cities in different countries and regions. The results demonstrate the superiority of our method, which not only significantly improves the accuracy of UFP but also provides interpretations of the model behaviors.
The rest of this article is organized as follows. In Section 2, we review the related work and position our work in that context. Section 3 formally defines the problem and introduces necessary background knowledge. Section 4 presents the details of our solution and insightful analysis. In Section 5, we compare DiffUFP to the state-of-the-art models under various configurations and demonstrate DiffUFP’s effect in solving the UFP problem. Finally, we conclude this study and outline future work in Section 6.

2 Related Work

We now review the related studies in UFP, graph-based spatio-temporal data analysis, and diffusion-based generative models, and point out the difference of this work.

2.1 Urban Flow Prediction

As one of the essential tasks in traffic management, UFP has attracted increasing interest in the past decade [57, 69]. Classic machine learning techniques have been widely used to model urban flows, such as ARIMA [35] and support vector regression [3]. However, the conventional machine learning models suffer from the underfitting problem due to their poor ability to learn high-dimensional representations and the requirements of expensive feature engineering. Recently, deep learning methods have been introduced to improve the performance of learning non-linear relations and complex interactions between flows in different areas. Earlier research applies RNN and its variants (e.g., LSTM and GRU) to capture the temporal features of urban flow sequences [28, 58]. For example, DeepUrbanEvent [19] builds the system with RNN to predict the crowd dynamics at big events. Nevertheless, the preceding methods fail to consider intrinsic spatial correlations.
Zhang et al. [60] leverage CNNs to extract the spatial correlations between grid regions on the citywide urban flow map. To model the spatial information, the extended works combine RNNs and CNNs [29, 54] or utilize residual neural networks [13] to solve the long-term dependencies. For example, ST-ResNet [59] exploits the residual connection to alleviate the overfitting problem in UFP. DeepSTN+ [26] employs a modified convolutional network structure to model the long-range spatial correlations among crowd flows. In addition, some researchers take the transitions of urban flows as another task to perform multi-task UFP. For instance, Zhang et al. [61] propose a multi-task framework to simultaneously predict the node flow and edge flow. MT-ASTN [48] designs a shared-private framework to predict crowd flows and the origin-destination locations of the crowd flow in an adversarial learning manner. Some hybrid approaches have also been proposed and significantly improved urban flow prediction. For example, ST-MetaNet+[37] uses meta-learning to learn the traffic-related embeddings of nodes and edges from the geo-graph attributes and the traffic context from the dynamic traffic states. STDEN [17] models the spatio-temporal dynamic process of the potential energy field as a differential equation network to integrate physical principles and data-driven models. DeepCrowd [18] proposes a high-dimensional and pyramid architecture attention mechanism based on convolutional LSTM to deal with human mobility in a dataset generated from real-world smartphone applications.
With the increasing interest in acquiring fine-grained urban flow data from coarse-grained urban flow data, a few studies focus on the fine-grained urban flow inference (FUFI) problem [51, 52]. UrbanFM [23] is the first study that formalizes the FUFI problem, which employs deep residual architecture [13] as the backbone and designs a normalization method to capture the spatial constraint. FODE [65] takes advantage of neural ordinary differential equations to balance inference accuracy and computational efficiency. UrbanPy [34] adopts a pyramid architecture to solve the large-scale upsampling issues in FUFI. Liang et al. [25] propose a universal method called DeepLGR that can address the UFP and FUFI simultaneously, which consists of a global module for holistic information learning and a local module to capture the nearby information. In this article, we focus on the UFP problem. However, our method can be easily adapted to address the FUFI problem.

2.2 Graph-Based Spatio-Temporal Learning

GNN has been proved as a helpful framework to model the no-Euclidean structured graph data [31, 49, 68]. Recently, exploiting GNNs for spatio-temporal data mining has been studied extensively, among which traffic speed prediction is a typical spatio-temporal learning task. For example, Yu et al. [56] integrate a graph convolutional network (GCN) [7] and convolution sequence model to capture the spatial and temporal correlations, respectively. Yao et al. [55] combine CNNs and the graph embedding method to extract the spatio-temporal signals. STDN [54] learns the transition regularities of the traffic flow by exploiting the periodically shifted attention. Geng et al. [11] employ a multi-modal GCN to learn region-wise interactions. Wang et al. [50] propose a spatio-temporal GNN with a learnable positional attention mechanism to capture spatial and temporal patterns comprehensively. Zheng et al. [63] use the graph multi-attention mechanism and an encoder-decoder framework for traffic prediction. Geng et al. [11] also propose a multi-modal GCN to model the interactions between neighboring regions. Wang et al. [47] introduce the attention mechanism to aggregate information from adjacent roads. To overcome the limit of network depth and improve the capacity of capturing longer-range spatio-temporal correlations, STGODE [10] proposes a continuous representation method of GNNs by utilizing a tensor-based ordinary differential equation.
The graph-based methods have also been used in UFP, and the grid regions of the urban flow map are generally considered as nodes [61]. Pan et al. [36] utilize the meta-learning method to perform knowledge transfer across regions using a recurrent graph attentive network. Zhang et al. [62] propose a graph-based framework that embeds multi-level temporal contextual signals by a multi-scale self-attention network. Notwithstanding the promising improvements achieved by these approaches, they ignore the gap between grid-like urban flow map partition and real-world road networks. Meanwhile, the semantic information over time and the impact of external factors on the transition pattern of people are not well considered in previous studies. In contrast, our method can extract the traffic network’s underlying structure from the Euclidean grid-like urban flow map and learn the transitions of urban flows through a well-designed conditional denoising score-based model.

2.3 Diffusion-Based Generative Model

Diffusion-based models refer to a family of methods learning generative models as transition operations of Markov chains. In general, they define a Markov chain of diffusion steps to slowly add Gaussian noise to data and learn to reverse the diffusion process to generate desired data samples from the noise. There are two kinds of diffusion models, the denoising diffusion probabilistic model (DDPM) [14] and the score-based model [40]the latter optimizes the score matching objective [16], whereas DDPM approaches the variational lower bound to obtain the log-likelihood.
Recently, deep diffusion models have shown strong performance in a range of tasks such as image generation [14, 40] and audio processing [5, 20, 44, 45]. For example, Kong et al. [20] and Chen et al. [5] use DDPM to generate high-fidelity audio conditional on mel-spectrograms. Ho et al. [14] achieve comparable and even better sample quality than the GAN-based image generation method [2]. Song and Ermon [40] designed a noise conditional score-based network and later provided a unified framework exploring the stochastic differential equation to improve score-based generative models [41], where the diffusion process is modeled as the discretization of a continuous stochastic differential equation. Dockhorn et al. [9] propose a well-designed forward diffusion process by augmenting the data variable with an additional velocity variable, which brings a smoother diffusion process and requires fewer sampling overheads. Chao et al. [4] formulate a new training objective to assist the classifier in matching the gradients of the authentic log-likelihood density under conditional situations. Besides, Rasul et al. [38] utilize DDPM for time series forecasting and show excellent performance. Tashiro et al. [46] successfully employ a score-based model to handle probabilistic time series imputation. Niu et al. [33] adopt a score-based model to model the graph and propose a permutation equivariant GNN. However, leveraging the diffusion-based model to handle the complicated spatio-temporal dependence is still under exploration.

3 Preliminaries

In this section, we first define the UFP problem studied in this work and introduce the necessary background of score-based models. The frequently used notations throughout this article are summarized in Table 1.
Table 1.
NotationDescription
\(I, J\)number of rows and columns of meshing city flow map
\(V=\lbrace r_{i,j}\rbrace\)grid regions set, \(1 \le i \le I, 1 \le j \le J\)
Nnumber of grid regions (i.e., \(N = I \times J)\)
\(\mathcal {P}\)traffic trajectories set
\(T_r\)a trajectory of \(\mathcal {P}\)
\(L_{T_r}\)number of coordinate points contained in \(T_r\)
\(\mathcal {T}\)available time interval set
ggeospatial coordinate (i.e., longitude and latitude)
\(m_{t, i, j}^{in}, m_{t, i, j}^{out}\)inflow and outflow of region \(r_{i,j}\) at time t
\(\tau _{in}, \tau _{\it out}\)number of timestamps for history/future urban flow
\(\mathbf {M}^S, \mathbf {M}^T\)historical urban flow and future urban flow
\(\mathbf {E}\)historical external factors
\(\mathbf {m}_{t}, \mathbf {e}_t\)tensor of node flow data and external factors at time t
\(N_r\)number of excavated semantic regions
\(\mathbf {X}^C, \mathbf {X}^P, \mathbf {X}^T\)fragment of historic urban flows that denotes near history, recent time, and weekly periodicity, respectively
\(\mathbf {X}^f\)fused historic sequence of urban flows
\(\mathbf {X}^{\prime }\)dynamically extracted nodes representations
\(\mathbf {X}\)fused nodes representations
\(\mathbf {A}^{sp}\)spatial adjacency matrix
\(\mathbf {A}^{te}\)temporal semantic adjacency matrix
\(\mathbf {A}^{\prime }\)conditional generative semantic adjacency matrix
\(\mathbf {A}^{st}\)spatio-temporal semantic adjacency matrix
Table 1. Notations

3.1 Problem Formulation

We consider the city that is partitioned into \(N(N = I \times J)\) equal-sized cell regions based on the longitude and latitude. Let \(V=\lbrace r_{1,1},\ldots ,r_{i,j},\ldots ,r_{I,J}\rbrace\) denote all cell regions, where \(r_{i,j}\) represents the i-th row and j-th column cell region of the grid map.
Definition 1 (Urban Flow).
Let \(\mathcal {T} = \left\lbrace t_1,\ldots ,t_{|\mathcal {T}|}\right\rbrace\) be a sequence of time intervals, \(| \cdot |\) denote the cardinality of the set, and \(\mathcal {P}\) be the collection of urban flow trajectories. Given a cell region \(r_{i,j}\), the corresponding inflow \(m_{t, i, j}^{in}\) and outflow \(m_{t, i, j}^{out}\) of the urban traffic in time slot t are defined as follows:
\[\begin{eqnarray*} \begin{aligned}m_{t, i, j}^{in} =\sum _{T_{r} \in \mathcal {P}} &\left|\left\lbrace g_{l} \mid g_{l-1} \notin r_{i, j} \wedge g_{l} \in r_{i, j} \wedge \tau _{l-1} \in t \wedge l\gt 1\right\rbrace \right|, \\ m_{t , i, j}^{out} =\sum _{T_{r} \in \mathcal {P}} &\left|\left\lbrace g_{l} \mid g_{l} \in r_{i, j} \wedge g_{l+1} \notin r_{i, j} \wedge \tau _{l} \in t \wedge l\lt L_{T_r}-1\right\rbrace \right|, \nonumber \nonumber \end{aligned} \end{eqnarray*}\]
where \(T_r : g_1 \rightarrow g_2 \rightarrow \ldots \rightarrow g_{L_{T_r}}\) is a trajectory in \(\mathcal {P}\), \(g_l\) is the geospatial coordinate that \(l \in [2,L_{T_r}],\) and \(L_{T_r}\) denotes the length of \(T_r\). Note that \(g_l \in r_{i,j}\) means the objective (e.g., a person, taxi, or bicycle) is within region \(r_{i,j}\), and \(\tau _l\) is the corresponding timestamp. The inflow and outflow of all regions in time t are denoted as a crowd flow tensor \(\mathbf {m}_t \in \mathbb {R}^{2 \times I \times J }\). Following previous studies [61], we flatten \(\mathbf {m}_t\) to a two-dimensional tensor with the shape of \(1 \times 2N\).
Urban traffic is closely related to external factors, such as time of day, events (e.g., holidays, weekends) [64], and weather conditions (temperature, wind speed, etc.) [42]. At time t, we denote these external factors as \(\mathbf {e}_t \in \mathbb {R}^{l_e}\).
Now we can formally define the problem of UFP.
Problem 1 (Urban Flow Prediction).
Given the historical traffic flow \(\mathbf {M}^S=\left(\mathbf {m}_{t-\tau _{\it in}+1}, \ldots , \mathbf {m}_{t}\right)\), and the corresponding external factors \(\mathbf {E}=\left(\mathbf {e}_{t-\tau _{\it in}+1}, \ldots , \mathbf {e}_{t}\right)\), a UFP model \(f_\text{UFP}\) tries to forecast the traffic flow \(\mathbf {M}^T=\left(\hat{\mathbf {m}}_{t+1}, \ldots , \hat{\mathbf {m}}_{t + \tau _{\it out} + 1}\right)\) in future \(\tau _{\it out}\) timesteps:
\begin{align} \mathbf {M}^T = f_\text{UFP}(\mathbf {M}^S, \mathbf {E}). \end{align}
(1)

3.2 Score-Based Generative Model

Given a dataset where each point is drawn independently from an underlying data distribution \(p(\mathbf {x})\), we call \(\nabla _{\mathbf {x}} \log p(\mathbf {x})\) its score function. The score function is an unnormalized density that does not depend on the partition function and is easier to estimate and model than the probability density function in many cases.
A score-based model trains a neural network \(\mathbf {s}_{\boldsymbol {\theta }}(\mathbf {x})\) parameterized by \(\boldsymbol {\theta },\) called the score network, to make \(\mathbf {s}_{\boldsymbol {\theta }}(\mathbf {x}) \approx \nabla _{\mathbf {x}} \log p(\mathbf {x})\). In this way, they are able to utilize Langevin dynamics to sample from the corresponding distribution once the score function is known. Langevin dynamics provides an MCMC (Markov chain Monte Carlo) procedure to sample from a distribution merely utilizing its score function. Therefore, it is easy to estimate the score function from data and generate new samples with Langevin dynamics, a.k.a. \({\it score-based generative modeling}\).
However, it is often inaccurate to estimate score function in low-density regions where few data points are available for computing the score matching objective. As a remedy, NCSN [40] perturbs the data with Gaussian noise of different scales and jointly estimates the score functions of all noise-perturbed data distributions. It uses a joint neural network to evaluate the score functions of noise-perturbed data distributions on different scales. Ideally, with enough data and model capacity, one can get the optimum score network. Then, it is easy to generate samples by the annealed Langevin dynamics, a modified method from Langevin dynamics that combines information from all scales noises.

4 Methodologies

Figure 2 illustrates the overall framework of DiffUFP. As shown in the left part of Figure 2, the input of the node representation learning module includes closeness \(\mathbf {X}^C\), period \(\mathbf {X}^P\), and trend \(\mathbf {X}^T\) following previous studies [59]. The three key time series are extracted from \(\mathbf {M}^S\) and used to capture three different properties of urban traffic flow. We first leverage the multi-head self-attention to mine the critical information of each sequence. Then, the three outputs are concatenated and fed into a Multi-Layer Perceptron (MLP) to produce the fused input sequence \(\mathbf {X}^f\). After that, the \(\mathbf {X}^f\) is fed into the dynamic semantic region extractor to generate \(\mathbf {X}^{\prime }\)—an element-wise operation added to \(\mathbf {X}^f\) to produce more expressive node representations \(\mathbf {X}\). The edge representation learning module takes external factors \(\mathbf {E}\) and \(\mathbf {X}\) as inputs, and a spatio-temporal adjacency matrix generator is designed to generate edge representations \(\mathbf {A}^{st}\) which bear rich spatio-temporal semantic information. Finally, the edge representation \(\mathbf {A}^{st}\) and the node representation \(\mathbf {X}\) are fed into two STG layers to predict the future flow. In the following, we will explain the details of the main components of DiffUFP, including the dynamic semantic region extractor (Section 4.1) and the spatio-temporal adjacency matrix generator (Section 4.2).
Fig. 2.
Fig. 2. The framework of DiffUFP. It exploits three key featured flow (i.e., trend, period, and closeness) to create the flow inputs. A dynamic semantic region extractor is also proposed to derive the node representations. External factors are fed into the spatio-temporal adjacency matrix generator to learn edge representations. Our cascaded spatio-temporal GNN (STG) layers utilize the node and edge representations to extract the higher-level features.

4.1 Dynamic Semantic Region Extractor

In UFP, the explicit traffic structure is usually unavailable, and we only have access to the grid-based traffic distributions. The critical semantic regions change over time with complex spatial and temporal relationships. It is therefore inefficient to simply take grid regions as nodes. Hence, we propose a semantic regions dynamic extraction method to enhance the urban flow representations. Our approach learns to discover the well-delimited salient regions in space and time to model the interactions between various semantic entities in urban traffic networks. Semantic nodes associate with such noteworthy regions by themselves, enabling the message passing between nodes to effectively model the urban flow transitions by exploiting the inductive bias.
The main structure of the dynamic semantic region extractor is illustrated in Figure 3. This module first receives the fused input sequence \(\mathbf {X}^f \in \mathbb {R}^{\tau _{in}^{\prime } \times 2 \times I \times J}\) and then infers the locations and sizes of \(N_r\) salient semantic regions. These semantic regions are then treated as graph nodes and projected to their original positions on the input map. Therefore, the dynamic semantic region extractor consists primarily of two components: a region extractor and a graph processor.
Fig. 3.
Fig. 3. The structure of the dynamic semantic region extractor. It extracts the salient regions as nodes for analysis of urban crowd flows. At each timestamp t, the dynamic semantic region extractor learns parameters \(\Delta x_i, \Delta y_i\) and \(w_i, h_i\) to denote the center location and the region size of each node i. The corresponding kernel function \(\mathcal {K}_i\) is used to extract the local features from the corresponding region of \(\mathbf {X}_t^f\). Finally, the nodes are processed by a spatio-temporal GNN and projected into the urban flow map.

4.1.1 Region Extractor.

The region extractor aggregates the input features to extract the salient semantic regions that are parameterized by their location \((\Delta x,\Delta y)\) and size \((w, h)\). Two convolution layers are used to capture the local information from the input while preserving sufficient positional information at each timestep. For each semantic region node, we use a fully connected network (FCN) \(f_i\) to generate the latent representation \(\hat{\mathbf {h}}_{i, t}\):
\begin{align} \hat{\mathbf {h}}_{i, t}={f_i}\Big (\text{CNN}\Big (\mathbf {{X}}^{f}_{t}\Big)\Big) \in \mathbb {R}^{C}, \forall i \in [1, N_r]. \end{align}
(2)
In our experiments, we set the channel dimension C to 2 and the convolution kernel to \(3 \times 3\) with stride 1. To capture the temporal information, we use the GRU [6] to fuse the historical latent representations of semantic regions and thus obtain the temporal hidden state \(\mathbf {z}_{i, t}\):
\begin{align} \mathbf {z}_{i,t} = \text{GRU}\big (\mathbf {z}_{i,t-1},\hat{\mathbf {h}}_{i, t}\big) \in \mathbb {R}^{C}, \forall i \in [1, N_r], \end{align}
(3)
where \(\mathbf {z}_{i,0}\) is a randomly initialized vector. After a linear projection, we get the predicted region parameters:
\begin{align} \Delta x_{i, t}, \Delta y_{i, t}, w_{i, t}, h_{i, t} =\xi \left(W_{o} \mathbf {z}_{i, t}\right) \in \mathbb {R}^{4}, \end{align}
(4)
where \(W_{o} \in \mathbb {R}^{4 \times C}\), \(\xi\) denotes another FCN that controls the initialization of the size and location of the predicted regions, \((\Delta x_{i, t}, \Delta y_{i, t})\) and \((w_{i, t}, h_{i, t})\) represent the center location and the i-th semantic region at time t. Each region node is then interpolated to a grid-based map using a kernel function \(\mathcal {K}\) which is separable from the axis:
\begin{align} &k_{x}^{i}\left(p_{x}\right)=\max \left(0, w_{i}-\left|\Delta x_{i}-p_{x}\right|\right), \end{align}
(5)
\begin{align} &k_{y}^{i}\left(p_{y}\right)=\max \left(0, h_{i}-\left|\Delta y_{i}-p_{y}\right|\right), \end{align}
(6)
\begin{align} \mathcal {K}^{i}\left(p_{x}, p_{y}\right)=k_{x}^{i}\left(p_{x}\right) k_{y}^{i}\left(p_{y}\right) \in \mathbb {R}, \end{align}
(7)
where \((p_{x}, p_{y})\) indicates a certain spatial location at the input urban flow map; \(w_{i}\), \(h_{i}\), \(\Delta x_{i}\), and \(\Delta y_{i}\) are the predicted region parameters from Equation (4); \((w_{i}, h_{i})\) represents their sizes; and \((\Delta x_{i}, \Delta y_{i})\) controls their locations. The position embedding of each node is derived by summing the linear projection of the outputs of the kernel function:
\begin{align} \mathbf {pe}_{i}=\sum _{p_{x}=1}^{J} \sum _{p_{y}=1}^{I} \mathcal {K}^{i}\left(p_{x}, p_{y}\right) \mathbf {X}^f_{p_{x}, p_{y}} \in \mathbb {R}^{C}, \forall i \in [1, N_r], \end{align}
(8)
where \(\mathbf {pe}_{i}\) denotes the position embedding of the i-th semantic node.

4.1.2 Graph Processor.

The goal of the graph processor is to map the semantic region nodes to the input city map. It achieves this by a general recurrent STG processing procedure [32]. At each timestep t, each pair of nodes exchanges messages as
\begin{align} \mathbf {pe}_{i, t}=\sum _{j=1}^{N} \kappa \big (\mathbf {pe}_{j, t}, \mathbf {pe}_{i, t}\big) \operatorname{MLP}\big (\big [\mathbf {pe}_{j, t} ; \mathbf {pe}_{i, t}\big ]\big) \in \mathbb {R}^{C}, \end{align}
(9)
where \(\kappa (\cdot ,\cdot)\) indicates the dot product attention operation and MLP is a three-layer perceptron. To incorporate the temporal information, the feature of each node is updated with a shared GRU across timesteps. We remap the semantic region nodes to the city map by multiplying the nodes’ features and the kernel function output:
\begin{align} \mathbf {X^{\prime }}_{t, p_{x}, p_{y}}=\sum _{i=1}^{N} \mathcal {K}_{t}^{i}\left(p_{x}, p_{y}\right) \hat{\mathbf {pe}}_{i, t} \in \mathbb {R}^{C}, \end{align}
(10)
where \(\hat{\mathbf {pe}}_{i, t+1}\) denotes the feature in the next time. The final input feature X is generated as
\begin{align} \mathbf {X} = \mathbf {X}^f \oplus \mathbf {X}^{\prime } \in \mathbb {R}^{\tau _{in}^{\prime } \times 2 \times I \times J}, \end{align}
(11)
where \(\oplus\) is an element-wise addition and \(\mathbf {X}\) is taken as node representation of the graph-based model.

4.2 Spatio-Temporal Adjacency Matrix Generator

It is crucial to accurately model the transitions in urban traffic flow. Currently, researchers typically create traffic flow graphs by rasterizing urban maps and connecting geographically adjacent regions [37, 61], or they use road networks as a reference point [42]. However, despite these existing methods, there is still room for improvement in three aspects. First, most studies only consider static spatial distances and neglect the temporal aspect of learning urban flow transitions. Second, external factors that directly impact traffic transitions, such as extreme weather conditions or holidays, should be considered during modeling. Last, to accurately predict future urban flows, models must have strong generalization abilities to handle unseen traffic patterns within their training dataset.
To address these issues, we design a spatiotemporal adjacency matrix generator to produce a semantically rich adjacency matrix that captures both geographic and semantically similar. We implement it via a conditional score-based model that can effectively incorporate external factors into adjacency matrix generation. The structure of the spatio-temporal adjacency matrix generator is illustrated in Figure 4. The inputs of the spatio-temporal adjacency matrix generator are node representations \(\mathbf {X}\) and external factors \(\mathbf {E}\). First, we construct the semantic adjacency matrix \(\mathbf {A}^{te}\). Then, a conditional score-based generative model is used to incorporate external factors and complement unseen traffic patterns simultaneously. The output of the conditional score-based generative model is denoted by \(\mathbf {A}^{\prime }\). Finally, we obtain the target spatio-temporal adjacency matrix \(\mathbf {A}^{st}\) by fusing \(\mathbf {A}^{\prime }\) and the spatial adjacency matrix \(\mathbf {A}^{sp}\). We present the details of the spatio-temporal adjacency matrix generator. Section 4.2.1 explains how to construct the spatial adjacency matrix, which is the basis of building the urban flow graph. Section 4.2.2 establishes the semantic edges based on the input matrix \(\mathbf {X}\), based on which we detail the conditional score-based graph generation process that produces the final spatio-temporal semantic adjacency matrix in Section 4.2.2.
Fig. 4.
Fig. 4. The structure of the spatio-temporal adjacency matrix generator. The temporal semantic adjacency matrix \(\mathbf {A}^{te}\) is first constructed with the semantic adjacency matrix construction (Section 4.2.2). Then the conditional score-based graph generation (Section 4.2.3) is executed to incorporate external factors and complement unseen traffic patterns. Finally, we obtain the spatio-temporal adjacency matrix \(\mathbf {A}^{st}\) after fusing spatial adjacency matrix \(\mathbf {A}^{sp}\) with \(\mathbf {A}^{\prime }\).

4.2.1 Spatial Adjacency Matrix Construction.

We rasterize the urban map to construct the spatial adjacency matrix \(\mathbf {A}^{sp}\) that captures the connectivity between two nodes according to their geographic locations. For a pair of nodes a and b, the corresponding value of \(\mathbf {A}^{sp}\) is set as
\begin{align} \mathbf {A}_{a, b}^{sp}= {\left\lbrace \begin{array}{ll}1,& \text{ if } \left|a_i - b_i \right|\le \epsilon ^{sp} \cap \left|a_j - b_j \right|\le \epsilon ^{sp} \\ 0,& \text{otherwise }\end{array}\right.}, \end{align}
(12)
where \((a_i,a_j)\) and \((b_i,b_j)\) are the coordinates of nodes a and b in the grid-like urban map, respectively, and \(\epsilon ^{sp}\) is the hyperparameter to control the sparsity of \(\mathbf {A}^{sp}\).

4.2.2 Semantic Adjacency Matrix Construction.

In addition to the geographical neighborhood, the semantic neighborhood also influences traffic flow prediction. The semantic neighborhood here refers to the nodes with similar traffic patterns. For instance, regions around shopping malls share similar inflow and outflow patterns regardless of geographical distance. Although other traffic-related information such as the user trajectory would also contribute to traffic patterns, in the case of UFP we can only access the urban flow due to privacy policy. In this work, we propose a semantic adjacency matrix to capture this kind of semantic neighborhood.
For a given node, the traffic pattern can typically be depicted by the inflow and outflow sequences represented as time series. Therefore, traffic similarity can be measured by the inflow and outflow similarity between two nodes. Here we use the Dynamic Time Warping (DTW) [1] algorithm to calculate the similarities of flows (inflow and outflow) in different nodes.
In this way, DiffUFP can obtain more meaningful neighbors enriching the graph representation of traffic flow. It is worth noting that STGODE [10] first proposed the semantic neighbors. However, STGODE only considered the sum of inflow and outflow, which is coarse grained. The specific construction of the semantic adjacency matrix at time t \(\mathbf {A}_{t}^{\it te}\) is as follows:
\begin{equation} \mathbf {A}^{\it te}_{t,i,j} = {\left\lbrace \begin{array}{ll}1, & \operatorname{DTW}\left(S_{t,i,j}^{\it in}, S_{t,i,j}^{\it out}\right)\lt \epsilon ^{te} \\ 0, & \text{ otherwise } \end{array}\right.}, \end{equation}
(13)
where \(\epsilon ^{\it te}\) is the threshold to control the sparsity of the adjacency matrix, \(S_{t,i,j}^{\it in}\) and \(S_{t,i,j}^{\it out}\) denote the inflow and outflow sequences of cell regions \(r_{i,j}\) intercepted around time t of \(\mathbf {X}\), and \(\operatorname{DTW}(\cdot ,\cdot)\) indicates the DTW measurement on two sequences. Note that different from the static spatial adjacency matrix \(\mathbf {A}^{sp}\), \(\mathbf {A}^{\it te} \in \mathbb {R}^{\tau _{in}^{\prime } \times I \times J}\) is time varying.

4.2.3 Conditional Score-Based Graph Generation.

Two things are essential when forecasting urban flows. The first is the external factors such as weather and special events, and the other is the unseen traffic patterns.
External factors impact the mobility patterns of urban crowds and hence the transitions of urban flow. However, most existing approaches incorporate external factors in the same way as embedding urban flows or merely with several layers of feedforward neural networks. For example, they fuse the learned embeddings of external factors with the representation of the traffic flow graph via an addition operation or FCNs [26, 59, 61]. However, these methods are straightforward without considering fine-grained relations between external factors and urban flow. Here we argue that the external effect should be considered when constructing the urban flow graph rather than only in the final representation learning. Meanwhile, traffic patterns can boost the UFP, which, however, are usually absent in the data. On the one hand, the trajectory sampling frequency is too low to derive meaningful patterns in urban data. On the other hand, the inflow and outflow in a region are estimated in a period rather than in real time. Therefore, the urban flow graph derived from them is coarse grained.
In this work, we leverage the score-based generative model to address the preceding concerns simultaneously. As a kind of powerful probabilistic generative model, the score-based models have been used to enhance graph-based neural networks and achieved promising performance. For example, Niu et al. [33] propose a permutation equivariant GNN to model the score function of the input graph, which employs the desirable inductive bias and gains high generation results. Inspired by its success, we try to employ a denoising diffusion framework to learn the gradient of the data distribution of the graphs to generate a more expressive adjacency matrix in a conditional manner. Unlike Niu et al. [33], we need to generate an unweighted direct graph in the latent semantic space. Moreover, to take full advantage of external factors, we have to generate the semantic adjacency matrix with external factors as the condition.
Our conditional generation model can be simply formalized as estimating \(p(\mathbf {A}_t \mid \mathbf {e}_t)\), which means that we generate the semantic adjacency matrix at time t \(\mathbf {A}_t\) conditioned on the external factors \(\mathbf {e}_t\). Specifically, we perturb \(\mathbf {A}_t\) and \(\mathbf {e}_t\) with the white noise of varying scales to meet the requirement that the probability density function should be non-zero everywhere. The noise is assumed to be sufficiently small so that \(p_\sigma (\tilde{\mathbf {A}}_t) \approx p(\mathbf {A}_t)\) and \(p_\tau (\tilde{\mathbf {e}}_t) \approx p(\mathbf {e}_t)\). According to Bayes’ theorem,
\begin{align} p_{\sigma ,\tau }(\tilde{\mathbf {A}}_t \mid \tilde{\mathbf {e}}_t) = p_{\sigma ,\tau }(\tilde{\mathbf {e}}_t \mid \tilde{\mathbf {A}}_t) p_\sigma (\tilde{\mathbf {A}}_t) / p_\tau (\tilde{\mathbf {e}}_t). \end{align}
(14)
Correspondingly, we can decompose the conditional score function into a mixture of scores by taking the log-gradient on both sides of Equation (14):
\begin{align} \begin{aligned}\nabla _{\tilde{\mathbf {A}}} \log p_{\sigma , \tau }(\tilde{\mathbf {A}}_t \mid \tilde{\mathbf {e}}_t)=&\nabla _{\tilde{\mathbf {A}}} \log p_{\sigma , \tau }(\tilde{\mathbf {e}}_t \mid \tilde{\mathbf {A}}_t)+\nabla _{\tilde{\mathbf {A}}} \log p_{\sigma }(\tilde{\mathbf {A}}_t) -\underbrace{\nabla _{\tilde{\mathbf {A}}_t} \log p_{\tau }(\tilde{\mathbf {e}}_t)}_{=0}, \end{aligned} \end{align}
(15)
where \(\nabla _{\tilde{\mathbf {A}}} \log p_{\sigma , \tau }(\tilde{\mathbf {A}}_t \mid \tilde{\mathbf {e}}_t)\) is the posterior score, \(\nabla _{\tilde{\mathbf {A}}} \log p_{\sigma , \tau }(\tilde{\mathbf {e}}_t \mid \tilde{\mathbf {A}}_t)\) is the likelihood, and \(\nabla _{\tilde{\mathbf {A}}} \log p_{\sigma }(\tilde{\mathbf {A}}_t)\) is the prior score. A conditional score model can be derived by the log-gradient of a differentiable classifier \(p(\tilde{\mathbf {e}}_t \mid \tilde{\mathbf {A}}_t; \phi)\) and a prior score model \(\mathbf {s}(\tilde{\mathbf {A}}_t; \theta)\):
\begin{align} \nabla _{\tilde{\mathbf {A}}} \log p(\tilde{\mathbf {A}}_t \mid \tilde{\mathbf {e}}_t ; \theta , \phi)=\nabla _{\tilde{\mathbf {A}}} \log p(\tilde{\mathbf {e}}_t \mid \tilde{\mathbf {A}}_t ; \phi)+\mathbf {s}(\tilde{\mathbf {A}}_t ; \theta). \end{align}
(16)
We train a classifier trained with cross-entropy loss function \(L_{\mathrm{CE}}(\phi)\) and score model with denoising score match loss function \(L_{\mathrm{DE}}(\theta)\):
\begin{align} &L_{\mathrm{CE}}(\phi)=\mathbb {E}_{p_{\sigma ,\tau }(\tilde{\mathbf {A}} \mid \tilde{\mathbf {e}})}[-\log p(\tilde{\mathbf {e}} \mid \tilde{\mathbf {A}}; \phi)], \end{align}
(17)
\begin{align} &L_{\mathrm{DE}}(\theta)= \frac{1}{2}\mathbb {E}_{p_{\sigma }(\tilde{\mathbf {A}}, \mathbf {A})}\left\Vert \mathbf {s}(\tilde{\mathbf {A}} ; \theta)-\nabla _{\tilde{\mathbf {A}}} \log p_{\sigma }(\tilde{\mathbf {A}} \mid \mathbf {A})\right\Vert ^{2}. \end{align}
(18)
The total training loss function of the conditional adjacency matrix generation is as follows:
\begin{align} L_{\mathrm{Total}}=L_{\mathrm{DE}}(\theta)+\lambda L_{\mathrm{CE}}(\phi), \end{align}
(19)
where \(\lambda \gt 0\) is a balance coefficient. The detailed training process is shown in Figure 5.
Fig. 5.
Fig. 5. The training procedure of the proposed conditional adjacency matrix generation. A score model \(\mathbf {s}(\tilde{\mathbf {A}}_t; \theta)\) is updated in terms of \(L_{\mathrm{DE}}(\theta)\) to match the score function. A classifier \(\nabla _{\tilde{\mathbf {A}}} \log p(\tilde{\mathbf {e}}_t \mid \tilde{\mathbf {A}}_t ; \phi)\) is updated by using \(L_{\mathrm{CE}}(\phi)\) with the frozen weights of the trained score networks.
As for the score model, we add the Gaussian noise to the temporal semantic adjacency matrix \(\mathbf {A}^{te}\) obtained in Section 4.2.2. In particular, to preserve the semantic information as much as possible, we use the \(\mathbf {A}^{te}\) before being quantized by \(\epsilon _{te}\) during the diffusing process. In the following sections, we omit the superscript of \(\mathbf {A}^{te}\) for brevity. Specifically, we choose a geometric sequence \(\lbrace \sigma _{i}\rbrace _{i=0}^{L}\) as the set of noise scales, where \(L=1,000\), \(\sigma _L = 10\), \(\sigma _0 = 0.01\), and \(\sigma _{i}=\sigma _{0}(\frac{\sigma _{L }}{\sigma _{0}})^{\frac{i}{L}}\). Let \(p_{\sigma _i}(\tilde{\mathbf {A}}_i \mid \mathbf {A})=\mathcal {N}(\tilde{\mathbf {A}}_i \mid \mathbf {A}, \sigma ^{2} \mathbf {I})\), and the corresponding noise-perturbed data distribution is denoted as \(p_{\sigma _i}(\tilde{\mathbf {A}}_i) \triangleq \int p_{\sigma }(\tilde{\mathbf {A}}_i \mid \mathbf {A}) p(\mathbf {A}) \mathrm{d} \mathbf {A}\). We train a joint score network to estimate the score function of each \(p_{\sigma _i}({\mathbf {A}})\) by optimizing with \(L_{\mathrm{DE}}(\theta)\) as
\begin{align} \frac{1}{2 L} \sum _{i=1}^{L} \mathbb {E}_{p(\mathbf {A})} \mathbb {E}_{p_{\sigma _{i}}(\tilde{\mathbf {A}}_i \mid \mathbf {A})}\left[\left\Vert \sigma _{i} \mathbf {s}_{\boldsymbol {\theta }}\Big (\tilde{\mathbf {A}}_i, \sigma _{i}\Big)+\frac{\tilde{\mathbf {A}}_i-\mathbf {A}}{\sigma _{i}}\right\Vert _{2}^{2}\right]. \end{align}
(20)
Besides, we use \(L_{\mathrm{CE}}(\phi)\) to update the classifier \(\nabla _{\tilde{\mathbf {A}}} \log p(\tilde{\mathbf {e}} \mid \tilde{\mathbf {A}} ; \phi),\) allowing it to capture the fine-grained relationships between transitions of urban flows and external factors.
Subsequently, we perform conditional sampling (outlined in Algorithm 1) based on Equation (16) to generate the semantic matrix \(\tilde{\mathbf {A}}_0\). Because the samples \(\mathbf {A}^{\text{(sample)}}\) from score-based generation are in the continuous space, we discretize the obtained continuous adjacency matrix to a binary one at the end of Langevin dynamics and acquire the enhanced temporal adjacency matrix \(\mathbf {A}^{\prime }\) by a quantization operation:
\begin{align} \mathbf {A}^{\prime }_{i,j} = \mathbb {1}_{\mathbf {A}^{\text{(sample)}}\gt 0.5}, \end{align}
(21)
where \(\mathbb {1}\) denotes the indicator function that puts the value to 1 if the condition holds and 0 otherwise. We acquire \(\mathbf {A}^{\prime }_{1},\mathbf {A}^{\prime }_{2},\dots ,\mathbf {A}^{\prime }_{\tau _{in}}\) and take the mean of them to get \(\mathbf {A}^{\prime }\). Finally, we fuse \(\mathbf {A}^{\prime }\) and the spatial adjacency matrix \(\mathbf {A}^{sp}\) to compute the eventual adjacency matrix \(\mathbf {A}^{st}\):
\begin{align} \mathbf {A}^{st}=\mathbf {A}^{\prime } \oplus \beta \mathbf {A}^{sp}, \end{align}
(22)
where \(\beta \gt 0\) is a balance coefficient.

4.3 Prediction and Loss Function

The final step of DiffUFP is to predict the future urban flow. To this end, the learned node and edge representations are fed into the STG layers. It is worth noting that, unlike STGODE [10], our DiffUFP only employs one fused adjacency matrix to simultaneously capture the spatial and semantic knowledge. This property reduces the parameters and hence accelerates the training process. After two STG layers, a max-pooling layer and an output layer are followed to generate UFP.
We use Huber loss [15]—which is the tradeoff between square error loss and absolute error loss—as our objective function. Letting x and \(\hat{x}\) be the ground truth and the prediction, respectively, the Huber loss \(\mathcal {L}(x, \hat{x})\) is formulated as
\begin{align} \mathcal {L}(x, \hat{x})=\left\lbrace \!\!\begin{array}{lr}\frac{1}{2}(x-\hat{x})^{2} & \text{ for }|x-\hat{x}| \le \delta \\ \delta |x-\hat{x}|-\frac{1}{2} \delta ^{2}, & \text{otherwise}\end{array}\right., \end{align}
(23)
where \(\delta\) is the hyperparameter to control the model sensitivity to outliers. The detailed processing steps are summarized in Algorithm 2.

4.4 Computational Complexity

Compared with previous graph-based UFP models, DiffUFP introduces two extra components: the dynamic semantic region extractorand the spatio-temporal adjacency matrix generator.
Dynamic Semantic Region Extractor. The whole procedure of the dynamic semantic region extractor can be divided into three parts: (1) extracting the local information from inputs, (2) learning the spatio-temporal interactions of semantic regions, and (3) remapping the extracted semantic regions to the urban map. Let T represent the total timesteps, E denotes the edges connecting semantic regions, and \(K(=3)\) denotes the number of iterations for spatio-temporal message passing. The total time complexity is \(\mathcal {O}(T \times (2|E|) \times K + T \times N_r \times (K+1))\). Given that \(|E|\) is upper bounded by \(N_r(N_r-1)/2\), the complexity is at \(\mathcal {O}(T \times {N_r}^2 \times K)\).
Spatio-Temporal Adjacency Matrix Generator. It usually is slow to generate a sample from the Markov chain of the reverse process, as the diffusion step T can be a large number [39]. To speed up the training and sampling process, we use the denoising score-based model to generate the adjacency matrix conditionally. Therefore, our graph-based score-based method is much faster than directly predicting the time series that takes UFP as a multivariate time series forecasting task. The computational complexity is reduced from \(N^2 \times T \times 2\) to \(N^2\).

5 Experiments

In this section, we conduct extensive experiments on two real-world urban flow datasets to evaluate DiffUFP. Through the experiments, we want to answer the following research questions:
RQ1: Can DiffUFP outperform the state-of-the-art methods in UFP?
RQ2: How do the critical designs (i.e., the spatio-temporal adjacency matrix generator and dynamic semantic region extractor) improve the prediction performance of DiffUFP?
RQ3: How do the important parameters of DiffUFP influence the performance of DiffUFP?
RQ4: Can the proposed dynamic semantic region extractor extract meaningful semantic regions?

5.1 Experimental Settings

5.1.1 Datasets.

The statistics of the two traffic flow datasets, described next, are summarized in Table 2:
Table 2.
DatasetTaxiBJBikeNYC
Data typeTaxi GPSBike rent
LocationBeijingNew York
Start time7/1/20134/1/2014
End time4/10/201612/30/2014
Time interval0.5h1h
Holidays4120
Weather15 types\(\backslash\)
Temperature/\({ }^{\circ } \mathrm{C}\)[–24.6, 41.0]\(\backslash\)
Wind speed/mph[0, 48.6]\(\backslash\)
Latitude[39.82, 39.99][40.65, 40.81]
Longitude[116.26, 116.49][–74.07, –74.00]
Table 2. Statistics of the Datasets
TaxiBJ: This data contains the trajectories of taxi GPS in Beijing and is divided into four time periods: July 1 to October 30, 2013, March 1 to June 30, 2014, March 1 to June 30, 2015, and November 1, 2015 to April 10, 2016. Each trajectory is mapped into 32 \(\times\) 32 grid-based geographical regions.
BikeNYC:1 This dataset was collected from the NYC Bike system from April 1, 2014 to September 30, 2014. Each trajectory is projected into a 16 \(\times\) 8 grid map.
Data Preprocessing. For a fair comparison, we use the same data preprocessing approaches as baseline approaches [42, 54, 59, 61]. Specifically, we remove the data without 48 and 24 timestamps in TaxiBJ and BikeNYC, respectively. To reflect the traffic dynamics from different temporal dimensions, traffic flow is partitioned into three parts: closeness, period, and trend of the crowd flow following [59]. The external factors can typically be classified into continuous features (e.g., temperature) and categorical features (e.g., weather). To combine the two different kinds of features, we embed the categorical features into low-dimensional vectors. The final external features are formed by concatenating the two types of external features. We scale the traffic volumes and numerical external factors into the range [–1,1] by using Max-Min normalization to eliminate the difference between the measurement units of different features. After that, the processed external features are fed into the adjacency matrix generator to learn the semantic adjacency matrix in a conditional fashion. The two datasets are divided into training sets, validation sets, and testing sets with a ratio of 6:2:2. During testing, we rescale the predicted flow.

5.1.2 Baselines.

We compare DiffUFP to the following 15 UFP models:
HA models the urban flows as a seasonal process (i.e., 1 day) and takes the average of the previous seasons as the prediction results.
ARIMA [35] is a classical time series model that combines autoregressive (AR) and moving average (MA) to perform prediction.
RNN [28] utilizes RNNs to capture the temporal dependencies to predict the sequential data.
Seq2Seq [43] is an encoder-decoder model that uses two stacked GRU layers to implement a sequence-to-sequence network. Node features are embedded by an FCN and fused with the outputs of the decoder.
DeepST [60] employs CNNs to encode the spatial interactions between grid regions on the citywide traffic flow map.
ST-ResNet [59] leverages the residual connection to alleviate the overfitting problem in spatio-temporal prediction.
DCRNN [22] uses diffusion RNNs to capture the dynamic spatio-temporal dependencies.
DMVST-Net [55] combines the convolutional recurrent networks and the graph embedding method to extract spatio-temporal signals and make flow predictions.
STDN [54] exploits the periodically shifted attention to learn the transition regularities of traffic flow.
MDL [61] is a multi-task deep learning framework that simultaneously predicts the node flow and edge flow.
ST-GCN [56] integrates a GCN and convolution sequence model to capture the spatial and temporal correlations.
ST-MGCN [11] employs a multi-modal GCN to learn region-wise interactions.
GMAN [63] integrates the graph multi-attention into an encoder-decoder traffic prediction framework.
ST-MetaNet [36] utilizes the meta-learning method to perform knowledge transfer across urban flows with a recurrent graph attentive network.
ST-GDN [62] employs a multi-scale attention network to capture multi-level temporal dynamics on the hierarchical GNNs.

5.1.3 Hyperparameters.

The hyperparameter configurations of DiffUFP are as follows:
The hyperparameter of the dynamic semantic region extractor: The numbers of the semantic region \(N_r\) for TaxiBJ and BikeNYC are 5 and 7, respectively. The number of channels \(C = 1\), as the outflow and inflow are two views of a semantic region. The dimensions of the hidden states of GRUs used in region extractor and graph processor are conducted grid search over \(\lbrace 32,64,128\rbrace\).
The hyperparameter of the spatio-temporal adjacency matrix generator: The thresholds \(\epsilon ^{sp}\) and \(\epsilon ^{te}\) of \(\mathbf {A}^{sp}\) and \(\mathbf {A}^{te}\) are set to 2.0 and 0.5, respectively. Besides, we set \(\sigma _1 = 0.01\) and \(\sigma _L = 1\). During sampling, we conduct 100 iterations, and the balance coefficient \(\beta\) is set to 0.2.
The structure of the STG: The hidden dimensions of TCN blocks are set to 64,32,64. Each STG layer contains two blocks. The batch sizes of training, validating, and testing are all 16.

5.1.4 Evaluation Protocols.

To evaluate the performance of DiffUFP, we adopt three widely used metrics: Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE):
\[\begin{eqnarray*} \begin{aligned}&R M S E=\sqrt {\frac{1}{M} \sum _{i=1}^{M}\left(\widehat{\mathbf {X}}_{t}-\mathbf {X}_{t}\right)^{2}} \nonumber \nonumber\\ &M A E=\frac{1}{M} \sum _{i=1}^{M}\left|\widehat{\mathbf {X}}_{t}-\mathbf {X}_{t}\right| \text{, } \nonumber \nonumber\\ &M A P E=\frac{1}{M} \sum _{i=1}^{M}\left|\frac{\widehat{\mathbf {X}}_{t}-\mathbf {X}_{t}}{\mathbf {X}_{t}}\right| \text{, } \nonumber \nonumber \end{aligned} \end{eqnarray*}\]
where M is the total number of all predicted traffic flow volumes, and \(\widehat{\mathbf {X}}_{t}\) and \(\mathbf {X}_{t}\) denote the predicted flow and the ground truth, respectively. For both metrics, a smaller value means higher accuracy.

5.2 Performance Comparison (RQ1)

The performance of DiffUFP and baseline methods on two datasets is reported in Table 3 and Table 4. We test all of the baseline models five times and report the average results as “mean \(\pm\) standard deviation.” By examining the results, we have the following observations.
Table 3.
Method30min1h2h
MAERMSEMAPE(%)MAERMSEMAPE(%)MAERMSEMAPE(%)
HA26.16 \(\pm\) 0.0056.47 \(\pm\) 0.0034.21 \(\pm\) 0.0026.16 \(\pm\) 0.0056.47 \(\pm\) 0.0034.21 \(\pm\) 0.0026.16 \(\pm\) 0.0056.47 \(\pm\) 0.0034.21 \(\pm\) 0.00
ARIMA24.24 \(\pm\) 0.0041.75 \(\pm\) 0.0032.67 \(\pm\) 0.0027.12 \(\pm\) 0.0058.27 \(\pm\) 0.0032.67 \(\pm\) 0.0041.21 \(\pm\) 0.0076.98 \(\pm\) 0.0032.67 \(\pm\) 0.00
RNN17.85 \(\pm\) 0.1733.90 \(\pm\) 0.2124.79 \(\pm\) 0.2319.63 \(\pm\) 0.1040.42 \(\pm\) 0.1227.86 \(\pm\) 0.1331.45 \(\pm\) 0.1168.21 \(\pm\) 0.1929.80 \(\pm\) 0.23
Seq2Seq17.18 \(\pm\) 0.1230.16 \(\pm\) 0.1024.54 \(\pm\) 0.1617.82 \(\pm\) 0.0535.12 \(\pm\) 0.0727.64 \(\pm\) 0.0722.09 \(\pm\) 0.0662.75 \(\pm\) 0.1429.54 \(\pm\) 0.17
DeepST17.12 \(\pm\) 0.4530.44 \(\pm\) 0.5224.29 \(\pm\) 0.4217.44 \(\pm\) 0.7336.52 \(\pm\) 0.7827.23 \(\pm\) 0.8319.85 \(\pm\) 0.8344.81 \(\pm\) 0.8829.27 \(\pm\) 0.81
ST-ResNet16.83 \(\pm\) 0.3429.23 \(\pm\) 0.4923.85 \(\pm\) 0.4316.88 \(\pm\) 0.5032.90 \(\pm\) 0.6926.96 \(\pm\) 0.7518.94 \(\pm\) 0.5736.43 \(\pm\) 0.7229.13 \(\pm\) 0.76
DCRNN16.40 \(\pm\) 0.0428.38 \(\pm\) 0.0923.61 \(\pm\) 0.1015.84 \(\pm\) 0.0532.34 \(\pm\) 0.0826.59 \(\pm\) 0.1118.26 \(\pm\) 0.1536.91 \(\pm\) 0.1728.84 \(\pm\) 0.23
DMVST-Net15.88 \(\pm\) 0.3128.41 \(\pm\) 0.4523.05 \(\pm\) 0.3916.63 \(\pm\) 0.4432.18 \(\pm\) 0.5926.28 \(\pm\) 0.5318.63 \(\pm\) 0.4835.92 \(\pm\) 0.6428.81 \(\pm\) 0.70
STDN16.65 \(\pm\) 1.8928.93 \(\pm\) 2.4623.90 \(\pm\) 3.0421.52 \(\pm\) 3.2434.48 \(\pm\) 3.3526.42 \(\pm\) 3.4324.72 \(\pm\) 3.3637.26 \(\pm\) 3.4329.06 \(\pm\) 3.56
MDL15.45 \(\pm\) 0.2027.33 \(\pm\) 0.5322.86 \(\pm\) 0.4016.98 \(\pm\) 0.2432.09 \(\pm\) 0.5425.93 \(\pm\) 0.4718.42 \(\pm\) 0.3936.49 \(\pm\) 0.4928.35 \(\pm\) 0.44
ST-GCN14.73 \(\pm\) 0.1326.79 \(\pm\) 0.3122.34 \(\pm\) 0.2615.56 \(\pm\) 0.1431.74 \(\pm\) 0.3525.85 \(\pm\) 0.3617.96 \(\pm\) 0.2736.31 \(\pm\) 0.3128.26 \(\pm\) 0.32
ST-MGCN14.01 \(\pm\) 0.1825.13 \(\pm\) 0.2021.53 \(\pm\) 0.2315.08 \(\pm\) 0.1130.72 \(\pm\) 0.2325.17 \(\pm\) 0.2517.18 \(\pm\) 0.2235.03 \(\pm\) 0.3427.75 \(\pm\) 0.39
GMAN13.53 \(\pm\) 0.1424.79 \(\pm\) 0.3221.04 \(\pm\) 0.3015.04 \(\pm\) 0.1830.65 \(\pm\) 0.4225.01 \(\pm\) 0.3617.04 \(\pm\) 0.1934.82 \(\pm\) 0.6127.53 \(\pm\) 0.52
ST-MetaNet+12.80 \(\pm\) 0.1523.65 \(\pm\) 0.2820.73 \(\pm\) 0.2514.72 \(\pm\) 0.1829.70 \(\pm\) 0.4024.88 \(\pm\) 0.3216.85 \(\pm\) 0.1633.93 \(\pm\) 0.3527.48 \(\pm\) 0.38
ST-GDN12.23 \(\pm\) 0.1823.08 \(\pm\) 0.2020.28 \(\pm\) 0.1814.65 \(\pm\) 0.1929.43 \(\pm\) 0.4424.63 \(\pm\) 0.3316.64 \(\pm\) 0.2133.51 \(\pm\) 0.4727.01 \(\pm\) 0.45
DiffUFP11.97 \(\pm\) 0.1322.31 \(\pm\) 0.1920.05 \(\pm\) 0.1414.21 \(\pm\) 0.1228.90 \(\pm\) 0.1824.29 \(\pm\) 0.2016.02 \(\pm\) 0.1932.72 \(\pm\) 0.2426.42 \(\pm\) 0.30
Table 3. Performance Comparisons on the TaxiBJ Dataset
Table 4.
Method1h2h3h
MAERMSEMAPE(%)MAERMSEMAPE(%)MAERMSEMAPE(%)
HA9.54 \(\pm\) 0.0019.02 \(\pm\) 0.0036.34 \(\pm\) 0.009.54 \(\pm\) 0.0019.02 \(\pm\) 0.0036.34 \(\pm\) 0.009.54 \(\pm\) 0.0019.02 \(\pm\) 0.0036.34 \(\pm\) 0.00
ARIMA4.89 \(\pm\) 0.0011.10 \(\pm\) 0.0034.52 \(\pm\) 0.007.83 \(\pm\) 0.0016.45 \(\pm\) 0.0034.52 \(\pm\) 0.008.82 \(\pm\) 0.0017.63 \(\pm\) 0.0034.52 \(\pm\) 0.00
RNN4.50 \(\pm\) 0.129.14 \(\pm\) 0.1628.56 \(\pm\) 0.317.51 \(\pm\) 0.1514.93 \(\pm\) 0.2331.23 \(\pm\) 0.408.04 \(\pm\) 0.1714.32 \(\pm\) 0.3533.24 \(\pm\) 0.66
Seq2Seq4.13 \(\pm\) 0.098.86 \(\pm\) 0.1226.42 \(\pm\) 0.197.24 \(\pm\) 0.1312.96 \(\pm\) 0.1731.18 \(\pm\) 0.227.62 \(\pm\) 0.4113.80 \(\pm\) 0.5532.67 \(\pm\) 0.30
DeepST3.64 \(\pm\) 0.268.05 \(\pm\) 0.2925.31 \(\pm\) 0.387.03 \(\pm\) 0.3512.24 \(\pm\) 0.3331.09 \(\pm\) 0.467.43 \(\pm\) 0.4313.29 \(\pm\) 0.8132.51 \(\pm\) 0.78
ST-ResNet3.35 \(\pm\) 0.146.56 \(\pm\) 0.1724.18 \(\pm\) 0.366.84 \(\pm\) 0.1811.80 \(\pm\) 0.2130.02 \(\pm\) 0.447.27 \(\pm\) 0.2112.91 \(\pm\) 0.2932.03 \(\pm\) 0.72
DCRNN3.40 \(\pm\) 0.024.61 \(\pm\) 0.0324.02 \(\pm\) 0.206.58 \(\pm\) 0.0510.42 \(\pm\) 0.1029.43 \(\pm\) 0.257.32 \(\pm\) 0.0812.84 \(\pm\) 0.1231.98 \(\pm\) 0.36
DMVST-Net3.12 \(\pm\) 0.066.10 \(\pm\) 0.1423.59 \(\pm\) 0.266.62 \(\pm\) 0.1010.93 \(\pm\) 0.1829.24 \(\pm\) 0.287.14 \(\pm\) 0.1312.42 \(\pm\) 0.1931.87 \(\pm\) 0.42
STDN3.10 \(\pm\) 0.485.85 \(\pm\) 0.6323.37 \(\pm\) 2.486.63 \(\pm\) 0.6211.25 \(\pm\) 0.8228.78 \(\pm\) 2.677.13 \(\pm\) 0.7813.09 \(\pm\) 0.8931.23 \(\pm\) 3.14
MDL3.34 \(\pm\) 0.035.14 \(\pm\) 0.0523.11 \(\pm\) 0.426.71 \(\pm\) 0.1111.41 \(\pm\) 0.1328.42 \(\pm\) 0.497.50 \(\pm\) 0.1613.06 \(\pm\) 0.2730.82 \(\pm\) 0.57
ST-GCN3.28 \(\pm\) 0.044.76 \(\pm\) 0.0522.54 \(\pm\) 0.256.60 \(\pm\) 0.0710.38 \(\pm\) 0.1427.60 \(\pm\) 0.317.26 \(\pm\) 0.1212.68 \(\pm\) 0.2030.55 \(\pm\) 0.50
ST-MGCN3.14 \(\pm\) 0.034.51 \(\pm\) 0.0522.41 \(\pm\) 0.296.24 \(\pm\) 0.069.63 \(\pm\) 0.0827.21 \(\pm\) 0.366.89 \(\pm\) 0.0811.71 \(\pm\) 0.1030.12 \(\pm\) 0.48
GMAN3.08 \(\pm\) 0.034.32 \(\pm\) 0.0521.93 \(\pm\) 0.246.14 \(\pm\) 0.059.42 \(\pm\) 0.1126.89 \(\pm\) 0.316.57 \(\pm\) 0.1011.58 \(\pm\) 0.1730.06 \(\pm\) 0.46
ST-MetaNet+3.04 \(\pm\) 0.044.46 \(\pm\) 0.0921.99 \(\pm\) 0.255.98 \(\pm\) 0.089.21 \(\pm\) 0.1326.93 \(\pm\) 0.306.54 \(\pm\) 0.1011.34 \(\pm\) 0.1529.65 \(\pm\) 0.42
ST-GDN2.87 \(\pm\) 0.053.89 \(\pm\) 0.0921.92 \(\pm\) 0.205.80 \(\pm\) 0.128.97 \(\pm\) 0.1426.19 \(\pm\) 0.246.33 \(\pm\) 0.1510.95 \(\pm\) 0.1728.99 \(\pm\) 0.38
DiffUFP2.42 \(\pm\) 0.033.54 \(\pm\) 0.0520.84 \(\pm\) 0.175.53 \(\pm\) 0.078.42 \(\pm\) 0.0925.31 \(\pm\) 0.205.97 \(\pm\) 0.089.54 \(\pm\) 0.1227.67 \(\pm\) 0.30
Table 4. Performance Comparisons on the BikeNYC Dataset
First, statistical methods such as HA and ARIMA perform poorly with the most significant prediction errors on both datasets. This result demonstrates the high complexity of UFP, which cannot be accurately modeled by simple statistic approaches. Compared with the statistical methods, neural network based models exhibit much better performance. RNN and Seq2Seq model urban flow as sequences and utilize RNNs to capture the temporal property. However, they cannot learn the spatial dependencies essential in predicting urban flows. CNN-based models (e.g., DeepST, ST-ResNet) learn the spatial correlations with the convolutional operation, but they only consider the static correlations without correctly modeling urban traffic dynamics.
Second, hybrid methods (e.g., DCRNN, STDN, and MDL) introduce the attention mechanism and can capture spatial and temporal knowledge simultaneously. Therefore, these approaches usually achieve better performance than RNN-based and CNN-based models. We found that STDN is unstable as it gets notable deviations, as its convolution kernels and long short-term models share the parameters. Besides, MDL considers the grid regions of the urban flow map as nodes and the flow transitions between different areas as edges. However, it leverages a multi-task learning framework to simultaneously predict the node flow and edge flow, which may ignore the subtle correlations and compromise the prediction performance. This result suggests that we need more expressive models to learn the spatial and temporal dependencies for the UFP task.
Third, GNN-based methods including ours exhibit the best performance due to their ability to model the long-short term spatio-temporal correlations. Our DiffUFP achieves better performance than other methods due to its two special-designed modules. Compared to direct graph construction in previous studies, DiffUFP exploits a dynamic semantic region extractor to enhance the urban flow representation learning, which allows our model to discover the traffic interactions between semantic regions. Meanwhile, the spatio-temporal adjacency matrix generator enables DiffUFP to search geographically and semantically similar neighborhoods and aggregate more related urban flows in GNNs. Surprisingly, the performance becomes better as the increase of prediction horizons, which implies the superiority of our method in capturing long-term flow dependencies.

5.3 Ablation Study (RQ2)

To verify the effectiveness of different modules of DiffUFP, we conduct an ablation experiment by designing four variants of DiffUFP:
DiffUFP w/o N removes the dynamic semantic region extractorand directly takes the grid-like regions as the nodes to construct the STGs.
DiffUFP w/o E deletes the spatio-temporal adjacency matrix generator. Alternatively, we take the external information as another branch of input features and use the spatial adjacency matrix \(\mathbf {A}^{sp}\) as the eventual edge-wise representations which are similar to previous studies [62].
DiffUFP w/o D replaces the score-based model with a simple FCN to generate the enhanced semantic adjacency matrix \(\mathbf {A}^{\prime }\) conditioned on external factors.
DiffUFP w/o N&E removes both the dynamic semantic region extractor and the spatio-temporal adjacency matrix generator from DiffUFP.
The ablation results are presented in Table 5. The poor performance of DiffUFP w/o N&E proves the significant effects of our two important designs. DiffUFP w/o E gains remarkable improvement over DiffUFP w/o N&E, indicating that the dynamic semantic region extractor is capable of learning the underlying structure of traffic networks by excavating the semantic regions dynamically from the grid-like urban flow map. In most cases, DiffUFP w/o N is superior to DiffUFP w/o E, showing that our conditional score-based adjacency matrix can help the model capture the intricate transitional patterns and learn the influence of external factors. DiffUFP w/o D outperforms DiffUFP w/o E in all cases, showing that the semantic adjacency matrix can learn the dynamical semantic pattern. Meanwhile, compared with DiffUFP, DiffUFP w/o D yields weak performance, demonstrating the effectiveness of the score-based model in mining the intrinsic relationships between external factors and traffic flow transitions. According to the evaluations, both DiffUFP w/o N and DiffUFP w/o E have achieved decent performance on multi-step predictions, which verifies the effects of the dynamic semantic region extractor and the spatio-temporal adjacency matrix generator in boosting the expressiveness of the GNN on long-term temporal horizon predictions.
Table 5.
Model1h2h3h
Experimental results on TaxiBJ
DiffUFP w/o E& N31.30\(_{\pm 0.36}\)35.52\(_{\pm 0.50}\)37.96\(_{\pm 0.56}\)
DiffUFP w/o E30.84\(_{\pm 0.33}\)34.27\(_{\pm 0.50}\)36.18\(_{\pm 0.51}\)
DiffUFP w/o N29.93\(_{\pm 0.29}\)33.65\(_{\pm 1.87}\)35.64\(_{\pm 0.47}\)
DiffUFP w/o D29.57\(_{\pm 0.31}\)33.04\(_{\pm 0.48}\)35.10\(_{\pm 0.45}\)
DiffUFP28.97\(_{\pm 0.18}\)32.71\(_{\pm 0.24}\)34.47\(_{\pm 0.41}\)
Experimental results on BikeNYC
DiffUFP w/o E& N4.51\(_{\pm 0.10}\)9.70\(_{\pm 0.18}\)12.08\(_{\pm 0.23}\)
DiffUFP w/o E4.05\(_{\pm 0.08}\)9.14\(_{\pm 0.15}\)10.59\(_{\pm 0.22}\)
DiffUFP w/o N3.88\(_{\pm 0.07}\)8.86\(_{\pm 0.12}\)10.87\(_{\pm 0.15}\)
DiffUFP w/o D3.83\(_{\pm 0.09}\)8.73\(_{\pm 0.15}\)10.44\(_{\pm 0.17}\)
DiffUFP3.54\(_{\pm 0.05}\)8.42\(_{\pm 0.09}\)9.54\(_{\pm 0.12}\)
Table 5. Ablation Results (RMSE) on Two Datasets
We compare DiffUFP to its three variants and make DiffUFP w/o E& N as the base to compute the improvements of predictions on different horizons.

5.4 Hyperparameter Sensitivity Analysis (RQ3)

Finally, we investigate the impacts of important parameters on DiffUFP in terms of UFP performance. Figure 6 reports the results of the parameter sensitivity of DiffUFP.
Fig. 6.
Fig. 6. Sensitivity analysis of DiffUFP. Mean RMSEs are reported by five runs with the standard deviation. Vertical black dashed lines indicate the settings used in the experiments.
Batch Size B. We select the batch size from \(\lbrace 16, 32, 64, 128, 256\rbrace\). Figure 6(a) illustrates that for both datasets, the predictions are most accurate when the batch size is 16 (i.e., the larger the batch size, the worse the prediction results). This result suggests that a small batch size is enough for our model to learn the dynamics of urban flows.
Extracted Semantic Regions \(N_r\). The parameter \(N_r\) controls the number of semantic regions extracted by DiffUFP. We vary \(N_r\) from 3 to 15 and plot the results in Figure 6(b). The best predictions occur when \(N_r=5\) and \(N_r=7\) in TaxiBJ and BikeNYC, respectively. Such difference is mainly due to the different trajectory properties in the two datasets. TaxiBJ is generated by taxis, whereas BikeNYC is produced by bikes. Typically, bikes are more flexible than taxis in visiting more semantic regions.
Threshold \(\epsilon ^{sp}\). The threshold \(\epsilon ^{sp}\) controls the sparsity of the spatial adjacency matrix. We vary the spatial adjacent threshold from 1.0 to 4.0. As shown in Figure 6(c), the prediction errors are lowest when \(\epsilon ^{sp}\) is 2.0 on both datasets. The results also reveal that even though modern transportation enlarges the range of human mobility, people prefer to transfer in a moderately sized zone most of the time.
Coefficient \(\beta\). This is a variable used to balance the spatial information fusion in Equation (22). We adjust this coefficient from 0.05 to 1.0 to test the sensitivity of DiffUFP to the parameter \(\beta\), as illustrated in Figure 6(d). For both datasets, DiffUFP achieves the best performance when \(\beta\) is 0.2. This result demonstrates that a relatively small partition of spatial adjacent information is good enough for accurate flow predictions. Meanwhile, it also suggests that semantic adjacent knowledge is more important to learning the transitions of urban flows, as the semantic adjacency matrix is good at capturing the complex long-range dependencies.

5.5 Semantic Region Discovery (RQ4)

A meaningful semantic region should have a significant volume of flow interactions and correspond to important districts of traffic networks. To validate the effect of our dynamic semantic region extractor, we plot the extracted semantic regions in Beijing. As depicted in Figure 7, the extracted semantic regions are consistent with the roads and transfer stations that are experiencing or will experience high volumes of in/outflows. For instance, the red area in Figure 7(a) covers the Sanyuan bridge—a large-scale flyover where traffic jams frequently occur. Besides, by contrasting the current urban flow map (Figure 7(a) and Figure 7(c)) and the extracted semantic regions (Figure 7(b) and Figure 7(d)), we find that these regions will have vast flow transitions during the next short period. These phenomena demonstrate that our dynamic graph-based node representation method can discover salient regions as well as the underlying structure of urban traffic networks.
Fig. 7.
Fig. 7. The urban flows within the next hour (the brighter pixel represents a higher flow) and the corresponding semantic regions extracted by dynamic semantic region extractor on TaxiBJ. We set the number of semantic regions as 5 according to Section 5.4. Each semantic region is indicated by its geographic location and size denoted as \((\Delta x_i,\Delta y_i)\) and \((w_i, h_i)\).
Finally, we scrutinize the prediction errors made by DiffUFP and its variants in different regions to visualize the prediction errors. Specifically, we plot the results on TaxiBJ in Figure 8, where the brighter pixels mean more significant prediction errors. Clearly, the three variants are generally brighter than DiffUFP, which further verifies the effectiveness of the two essential designs in improving the flow prediction performance. Furthermore, we map an extracted semantic region (the areas in the green rectangles) back to the urban map. In the rest of the map, DiffUFP generally performs better than the other three variants, which means the semantic region plays an important role in learning the spatio-temporal interactions between traffic flows. However, the UFPs inside the semantic areas are very similar (i.e., it is difficult for DiffUFP to discriminate the inner flows within the green rectangles). This happens because the proposed dynamic semantic region extractor extracts the semantic regions without access to the actual road networks, which limits its ability to make more accurate predictions in the semantic regions. How to improve the performance of DiffUFP in learning the spatio-temporal correlations inside the semantic regions is beyond the scope of this article and left as our future work.
Fig. 8.
Fig. 8. Visualization of prediction errors. The brighter the pixels, the larger the prediction errors. Green rectangles are semantic regions extracted by the dynamic semantic region extractor.

6 Conclusion and Future Work

We presented DiffUFP, a novel GNN-based model for accurate UFP. It provided a new perspective on solving the spatio-temporal learning problem through constructing a more meaningful flow graph. Besides, we proposed a dynamic semantic region extractor that can extract semantic regions dynamically. Moreover, we introduced a conditional score-based adjacency matrix generator to capture the fine-grained impact of the external factors on the transitions of urban flows. Comprehensive experiments on benchmark datasets verified the performance of the proposed model. In the future, we are interested in speeding up the training and sampling process of the conditional score-based networks. Meanwhile, applying DiffUFP to solve other spatio-temporal tasks such as flow super-resolution [24] and traffic speed forecasting [10] is of interest in our future work.

Footnote

References

[1]
Donald J. Berndt and James Clifford. 1994. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (AAAIWS). 359–370.
[2]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large scale GAN training for high fidelity natural image synthesis. In Proceedings of the International Conference on Learning Representations (ICLR).
[3]
Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3 (2011), 1–27.
[4]
Chen-Hao Chao, Wei-Fang Sun, Bo-Wun Cheng, Yi-Chen Lo, Chia-Che Chang, Yu-Lun Liu, Yu-Lin Chang, Chia-Ping Chen, and Chun-Yi Lee. 2022. Denoising likelihood score matching for conditional score-based data generation. In Proceedings of the International Conference on Learning Representations (ICLR).
[5]
Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, and William Chan. 2021. WaveGrad: Estimating gradients for waveform generation. In Proceedings of the International Conference on Learning Representations (ICLR).
[6]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). 1724–1734.
[7]
Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS). 3837–3845.
[8]
Jinliang Deng, Xiusi Chen, Zipei Fan, Renhe Jiang, Xuan Song, and Ivor W. Tsang. 2021. The pulse of urban transport: Exploring the co-evolving pattern for spatio-temporal forecasting. ACM Transactions on Knowledge Discovery from Data 15, 6 (2021), 1–25.
[9]
Tim Dockhorn, Arash Vahdat, and Karsten Kreis. 2022. Score-based generative modeling with critically-damped Langevin diffusion. In Proceedings of the International Conference on Learning Representations (ICLR).
[10]
Zheng Fang, Qingqing Long, Guojie Song, and Kunqing Xie. 2021. Spatial-temporal graph ODE networks for traffic flow forecasting. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 364–373.
[11]
Xu Geng, Yaguang Li, Leye Wang, Lingyu Zhang, Qiang Yang, Jieping Ye, and Yan Liu. 2019. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 3656–3663.
[12]
Jindong Han, Hao Liu, Hengshu Zhu, Hui Xiong, and Dejing Dou. 2021. Joint air quality and weather prediction based on multi-adversarial spatiotemporal networks. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 4081–4089.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.
[14]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In Proceedings of the 2020 Conference on Neural Information Processing Systems (NeurIPS). 6840–6851.
[15]
Peter J. Huber. 1992. Robust estimation of a location parameter. In Breakthroughs in Statistics. Springer, 492–518.
[16]
Aapo Hyvärinen and Peter Dayan. 2005. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research 6 (2005), 695–709.
[17]
Jiahao Ji, Jingyuan Wang, Zhe Jiang, Jiawei Jiang, and Hu Zhang. 2022. STDEN: Towards physics-guided neural networks for traffic flow prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 4048–4056.
[18]
Renhe Jiang, Zekun Cai, Zhaonan Wang, Chuang Yang, Zipei Fan, Quanjun Chen, Kota Tsubouchi, Xuan Song, and Ryosuke Shibasaki. 2021. DeepCrowd: A deep model for large-scale citywide crowd density and flow prediction. IEEE Transactions on Knowledge and Data Engineering 35, 1 (2021), 276–290.
[19]
Renhe Jiang, Xuan Song, Dou Huang, Xiaoya Song, Tianqi Xia, Zekun Cai, Zhaonan Wang, Kyoung-Sook Kim, and Ryosuke Shibasaki. 2019. DeepUrbanEvent: A system for predicting citywide crowd dynamics at big events. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2114–2122.
[20]
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2021. DiffWave: A versatile diffusion model for audio synthesis. In Proceedings of the International Conference on Learning Representations (ICLR).
[21]
Jiyue Li, Senzhang Wang, Jiaqiang Zhang, Hao Miao, Junbo Zhang, and Philip Yu. 2022. Fine-grained urban flow inference with incomplete data. IEEE Transactions on Knowledge and Data Engineering (TKDE).
[22]
Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the International Conference on Learning Representations (ICLR).
[23]
Yuxuan Liang, Kun Ouyang, Lin Jing, Sijie Ruan, Ye Liu, Junbo Zhang, David S. Rosenblum, and Yu Zheng. 2019. UrbanFM: Inferring fine-grained urban flows. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 3132–3142.
[24]
Yuxuan Liang, Kun Ouyang, Junkai Sun, Yiwei Wang, Junbo Zhang, Yu Zheng, David Rosenblum, and Roger Zimmermann. 2021. Fine-grained urban flow prediction. In Proceedings of the International Conference of World Wide Web (WWW). 1833–1845.
[25]
Yuxuan Liang, Kun Ouyang, Yiwei Wang, Ye Liu, Junbo Zhang, Yu Zheng, and David S. Rosenblum. 2020. Revisiting convolutional neural networks for citywide crowd flow analytics. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD). 578–594.
[26]
Ziqian Lin, Jie Feng, Ziyang Lu, Yong Li, and Depeng Jin. 2019. DeepSTN+: Context-aware spatial-temporal neural network for crowd flow prediction in metropolis. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 1020–1027.
[27]
Jia Liu, Tianrui Li, Shenggong Ji, Peng Xie, Shengdong Du, Fei Teng, and Junbo Zhang. 2021. Urban flow pattern mining based on multi-source heterogeneous data fusion and knowledge graph embedding. IEEE Transactions on Knowledge and Data Engineering. Published Online, July 21, 2021.
[28]
Qiang Liu, Shu Wu, Liang Wang, and Tieniu Tan. 2016. Predicting the next location: A recurrent model with spatial and temporal contexts. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 194–200.
[29]
Yipeng Liu, Haifeng Zheng, Xinxin Feng, and Zhonghui Chen. 2017. Short-term traffic flow prediction with Conv-LSTM. In Proceedings of the International Conference on Wireless Communications and Signal Processing (WCSP). 1–6.
[30]
Bin Lu, Xiaoying Gan, Haiming Jin, Luoyi Fu, Xinbing Wang, and Haisong Zhang. 2022. Make more connections: Urban traffic flow forecasting with spatiotemporal adaptive gated graph convolution network. ACM Transactions on Intelligent Systems and Technology 13, 2 (2022), 1–25.
[31]
Zhilong Lu, Weifeng Lv, Zhipu Xie, Bowen Du, Guixi Xiong, Leilei Sun, and Haiquan Wang. 2022. Graph sequence neural network with an attention mechanism for traffic speed prediction. ACM Transactions on Intelligent Systems and Technology 13, 2 (2022), Article 20, 24 pages.
[32]
Andrei Nicolicioiu, Iulia Duta, and Marius Leordeanu. 2019. Recurrent space-time graph neural networks. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS). 12818–12830.
[33]
Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, and Stefano Ermon. 2020. Permutation invariant graph generation via score-based generative modeling. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS). 4474–4484.
[34]
Kun Ouyang, Yuxuan Liang, Ye Liu, Zekun Tong, Sijie Ruan, David Rosenblum, and Yu Zheng. 2020. Fine-grained urban flow inference. IEEE Transactions on Knowledge and Data Engineering 34, 6 (2020), 2755–2770.
[35]
Bei Pan, Ugur Demiryurek, and Cyrus Shahabi. 2012. Utilizing real-world transportation data for accurate traffic prediction. In Proceedings of the International Conference on Data Engineering (ICDE). 595–604.
[36]
Zheyi Pan, Yuxuan Liang, Weifeng Wang, Yong Yu, Yu Zheng, and Junbo Zhang. 2019. Urban traffic prediction from spatio-temporal data using deep meta learning. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 1720–1730.
[37]
Zheyi Pan, Wentao Zhang, Yuxuan Liang, Weinan Zhang, Yong Yu, Junbo Zhang, and Yu Zheng. 2020. Spatio-temporal meta learning for urban traffic prediction. IEEE Transactions on Knowledge and Data Engineering 34, 3 (2020), 1462–1476.
[38]
Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. 2021. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In Proceedings of the International Conference on Machine Learning (ICML). 8857–8868.
[39]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations (ICLR).
[40]
Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS). 11895–11907.
[41]
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-based generative modeling through stochastic differential equations. In Proceedings of the International Conference on Learning Representations (ICLR).
[42]
Junkai Sun, Junbo Zhang, Qiaofei Li, Xiuwen Yi, Yuxuan Liang, and Yu Zheng. 2020. Predicting citywide crowd flows in irregular regions using multi-view graph convolutional networks. IEEE Transactions on Knowledge and Data Engineering 34, 5 (2020), 2348–2359.
[43]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. In Proceedings of the 28th Conference on Neural Information Processing Systems (NeurIPS). 3104–3112.
[44]
Wenxin Tai, Yue Lei, Fan Zhou, Goce Trajcevski, and Ting Zhong. 2024. DOSE: Diffusion dropout with adaptive prior for speech enhancement. In Proceedings of the 2024 Conference on Neural Information Processing Systems. 36.
[45]
Wenxin Tai, Fan Zhou, Goce Trajcevski, and Ting Zhong. 2023. Revisiting denoising diffusion probabilistic models for speech enhancement: Condition collapse, efficiency and refinement. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 13627–13635.
[46]
Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. 2021. CSDI: Conditional score-based diffusion models for probabilistic time series imputation. In Proceedings of the 2021 Conference on Neural Information Processing Systems (NeurIPS). 24804–24816.
[47]
Pengyang Wang, Xiaolin Li, Yu Zheng, Charu Aggarwal, and Yanjie Fu. 2019. Spatiotemporal representation learning for driving behavior analysis: A joint perspective of peer and temporal dependencies. IEEE Transactions on Knowledge and Data Engineering 33, 2 (2019), 728–741.
[48]
Senzhang Wang, Hao Miao, Hao Chen, and Zhiqiu Huang. 2020. Multi-task adversarial spatial-temporal networks for crowd flow prediction. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). 1555–1564.
[49]
Senzhang Wang, Meiyue Zhang, Hao Miao, Zhaohui Peng, and Philip S. Yu. 2022. Multivariate correlation-aware spatio-temporal graph convolutional networks for multi-scale traffic prediction. ACM Transactions on Intelligent Systems and Technology 13, 3 (2022), Article 38, 22 pages.
[50]
Xiaoyang Wang, Yao Ma, Yiqi Wang, Wei Jin, Xin Wang, Jiliang Tang, Caiyan Jia, and Jian Yu. 2020. Traffic flow prediction via spatial temporal graph neural network. In Proceedings of the Web Conference 2020. 1082–1092.
[51]
Xovee Xu, Zhiyuan Wang, Qiang Gao, Ting Zhong, Bei Hui, Fan Zhou, and Goce Trajcevski. 2023. Spatial-temporal contrasting for fine-grained urban flow inference. IEEE Transactions on Big Data 9, 6 (2023), 1711–1725.
[52]
Xovee Xu, Yutao Wei, Pengyu Wang, Xucheng Luo, Fan Zhou, and Goce Trajcevski. 2023. Diffusion probabilistic modeling for fine-grained urban traffic flow inference with relaxed structural constraint. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE, 5.
[53]
Xovee Xu, Fan Zhou, Kunpeng Zhang, and Siyuan Liu. 2022. CCGL: Contrastive cascade graph learning. IEEE Transactions on Knowledge and Data Engineering 35, 5 (2022), 4539–4554.
[54]
Huaxiu Yao, Xianfeng Tang, Hua Wei, Guanjie Zheng, and Zhenhui Li. 2019. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 5668–5675.
[55]
Huaxiu Yao, Fei Wu, Jintao Ke, Xianfeng Tang, Yitian Jia, Siyu Lu, Pinghua Gong, Jieping Ye, and Zhenhui Li. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2588–2595.
[56]
Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 3634–3640.
[57]
Haoyang Yu, Xovee Xu, Ting Zhong, and Fan Zhou. 2023. Overcoming forgetting in fine-grained urban flow inference via adaptive knowledge replay. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 37. 5393–5401.
[58]
Rose Yu, Yaguang Li, Cyrus Shahabi, Ugur Demiryurek, and Yan Liu. 2017. Deep learning: A generic approach for extreme condition traffic forecasting. In Proceedings of the SIAM International Conference on Data Mining (SDM). 777–785.
[59]
Junbo Zhang, Yu Zheng, and Dekang Qi. 2017. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 1655–1661.
[60]
Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, and Xiuwen Yi. 2016. DNN-based prediction model for spatio-temporal data. In Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL). 1–4.
[61]
Junbo Zhang, Yu Zheng, Junkai Sun, and Dekang Qi. 2019. Flow prediction in spatio-temporal networks based on multitask deep learning. IEEE Transactions on Knowledge and Data Engineering 32, 3 (2019), 468–478.
[62]
Xiyue Zhang, Chao Huang, Yong Xu, Lianghao Xia, Peng Dai, Liefeng Bo, Junbo Zhang, and Yu Zheng. 2021. Traffic flow forecasting with spatial-temporal graph diffusion network. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 15008–15015.
[63]
Chuanpan Zheng, Xiaoliang Fan, Cheng Wang, and Jianzhong Qi. 2020. GMAN: A graph multi-attention network for traffic prediction. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 1234–1241.
[64]
Fan Zhou, Liang Li, Kunpeng Zhang, and Goce Trajcevski. 2021. Urban flow prediction with spatial-temporal neural ODEs. Transportation Research Part C: Emerging Technologies 124 (2021), 102912.
[65]
Fan Zhou, Liang Li, Ting Zhong, Goce Trajcevski, Kunpeng Zhang, and Jiahao Wang. 2020. Enhancing urban flow maps via neural ODEs. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI). 1295–1302.
[66]
Fan Zhou, Pengyu Wang, Xovee Xu, Wenxin Tai, and Goce Trajcevski. 2021. Contrastive trajectory learning for tour recommendation. ACM Transactions on Intelligent Systems and Technology 13, 1 (2021), 1–25.
[67]
Fan Zhou, Xovee Xu, Goce Trajcevski, and Kunpeng Zhang. 2021. A survey of information cascade analysis: Models, predictions, and recent advances. ACM Computing Surveys 54, 2 (2021), 1–36.
[68]
Yu Zhou, Haixia Zheng, Xin Huang, Shufeng Hao, Dengao Li, and Jumin Zhao. 2022. Graph neural networks: Taxonomy, advances, and trends. ACM Transactions on Intelligent Systems and Technology 13, 1 (2022), Article 15, 54 pages.
[69]
Xingchen Zou, Yibo Yan, Xixuan Hao, Yuehong Hu, Haomin Wen, Erdong Liu, Junbo Zhang, Yong Li, Tianrui Li, Yu Zheng, and Yuxuan Liang. 2024. Deep learning for cross-domain data fusion in urban computing: Taxonomy, advances, and outlook. arXiv:2402.19348 (2024).

Cited By

View all
  • (2024)Risk Reduction in Transportation Systems: The Role of Digital Twins According to a Bibliometric-Based Literature ReviewSustainability10.3390/su1608321216:8(3212)Online publication date: 11-Apr-2024
  • (2024)The Effects of E-Commerce Recommendation System Transparency on Consumer Trust: Exploring Parallel Multiple Mediators and a ModeratorJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer1904012619:4(2630-2649)Online publication date: 1-Oct-2024
  • (2024)UDIS: Enhancing Collaborative Filtering with Fusion of Dimensionality Reduction and Semantic SimilarityElectronics10.3390/electronics1320407313:20(4073)Online publication date: 16-Oct-2024

Index Terms

  1. Score-based Graph Learning for Urban Flow Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 3
    June 2024
    646 pages
    EISSN:2157-6912
    DOI:10.1145/3613609
    • Editor:
    • Huan Liu
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 17 May 2024
    Online AM: 01 April 2024
    Accepted: 26 February 2024
    Revised: 14 June 2023
    Received: 08 September 2022
    Published in TIST Volume 15, Issue 3

    Check for updates

    Author Tags

    1. Flow prediction
    2. spatio-temporal learning
    3. graph neural networks
    4. score-based model
    5. urban computing

    Qualifiers

    • Research-article

    Funding Sources

    • Open project of the Intelligent Terminal Key Laboratory of Sichuan Province
    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)943
    • Downloads (Last 6 weeks)109
    Reflects downloads up to 12 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Risk Reduction in Transportation Systems: The Role of Digital Twins According to a Bibliometric-Based Literature ReviewSustainability10.3390/su1608321216:8(3212)Online publication date: 11-Apr-2024
    • (2024)The Effects of E-Commerce Recommendation System Transparency on Consumer Trust: Exploring Parallel Multiple Mediators and a ModeratorJournal of Theoretical and Applied Electronic Commerce Research10.3390/jtaer1904012619:4(2630-2649)Online publication date: 1-Oct-2024
    • (2024)UDIS: Enhancing Collaborative Filtering with Fusion of Dimensionality Reduction and Semantic SimilarityElectronics10.3390/electronics1320407313:20(4073)Online publication date: 16-Oct-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media