Sum-Product-Attention Networks: Leveraging Self-Attention in Probabilistic Circuits
arXiv preprint arXiv:2109.06587, 2021•arxiv.org
Probabilistic circuits (PCs) have become the de-facto standard for learning and inference in
probabilistic modeling. We introduce Sum-Product-Attention Networks (SPAN), a new
generative model that integrates probabilistic circuits with Transformers. SPAN uses self-
attention to select the most relevant parts of a probabilistic circuit, here sum-product
networks, to improve the modeling capability of the underlying sum-product network. We
show that while modeling, SPAN focuses on a specific set of independent assumptions in …
probabilistic modeling. We introduce Sum-Product-Attention Networks (SPAN), a new
generative model that integrates probabilistic circuits with Transformers. SPAN uses self-
attention to select the most relevant parts of a probabilistic circuit, here sum-product
networks, to improve the modeling capability of the underlying sum-product network. We
show that while modeling, SPAN focuses on a specific set of independent assumptions in …
Probabilistic circuits (PCs) have become the de-facto standard for learning and inference in probabilistic modeling. We introduce Sum-Product-Attention Networks (SPAN), a new generative model that integrates probabilistic circuits with Transformers. SPAN uses self-attention to select the most relevant parts of a probabilistic circuit, here sum-product networks, to improve the modeling capability of the underlying sum-product network. We show that while modeling, SPAN focuses on a specific set of independent assumptions in every product layer of the sum-product network. Our empirical evaluations show that SPAN outperforms state-of-the-art probabilistic generative models on various benchmark data sets as well is an efficient generative image model.
arxiv.org