SciPost Submission Page
Generative Learning of Continuous Data by Tensor Networks
by Alex Meiburg, Jing Chen, Jacob Miller, Raphaëlle Tihon, Guillaume Rabusseau, Alejandro Perdomo-Ortiz
Submission summary
Authors (as registered SciPost users): | Jing Chen |
Submission information | |
---|---|
Preprint Link: | scipost_202404_00031v2 (pdf) |
Date accepted: | 2024-09-30 |
Date submitted: | 2024-09-03 19:15 |
Submitted by: | Chen, Jing |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational |
Abstract
Beyond their origin in modeling many-body quantum systems, tensor networks have emerged as a promising class of models for solving machine learning problems, notably in unsupervised generative learning. While possessing many desirable features arising from their quantum-inspired nature, tensor network generative models have previously been largely restricted to binary or categorical data, limiting their utility in real-world modeling problems. We overcome this by introducing a new family of tensor network generative models for continuous data, which are capable of learning from distributions containing continuous random variables. We develop our method in the setting of matrix product states, first deriving a universal expressivity theorem proving the ability of this model family to approximate any reasonably smooth probability density function with arbitrary precision. We then benchmark the performance of this model on several synthetic and real-world datasets, finding that the model learns and generalizes well on distributions of continuous and discrete variables. We develop methods for modeling different data domains, and introduce a trainable compression layer which is found to increase model performance given limited memory or computational resources. Overall, our methods give important theoretical and empirical evidence of the efficacy of quantum-inspired methods for the rapidly growing field of generative learning.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Author comments upon resubmission
1. Improved Introduction: We have revised the introduction to enhance readability, making it more accessible to researchers without a background in quantum physics.
2. Clarification of the Connection with the DMRG Scheme: We have provided a more precise explanation of the connection between our approach and the Density Matrix Renormalization Group (DMRG) scheme. Additionally, we have distinguished between two scenarios when referring to DMRG to avoid potential misunderstandings:
• In the context of sweep optimization strategies, most machine learning (ML) and neural network (NN) algorithms update all parameters simultaneously. In contrast, the DMRG approach freezes the parameters of the environment and only optimizes the targeted tensors (either one-site or two-site) until they reach their optimal state before moving to the next tensor. The optimization process proceeds from left to right and then from right to left, which is referred to as a “sweep.”
• In the case of the DMRG two-site update scheme, we target two adjacent tensors simultaneously, which are then factorized back into two separate tensors, resulting in a new link index. The bond dimension and block structure of this new link index differ from the original and dynamically adjust during the sweep. In contrast, the one-site update scheme keeps both the bond dimension and block structures fixed during initialization, with updates occurring only to the tensor elements while the structure remains resolved.
3. Additional Citations: We have added four more references in Section 2.C (“Related Work”) as suggested by the referees.
4. Implementation of SciPost LaTeX Template: We have reformatted the manuscript to comply with the SciPost LaTeX template requirements.
List of changes
1. We have applied the SciPost template in this version.
2. Rephrased the 4th Paragraph of the Introduction:
The original text:
“This restriction can be best understood within the BM formalism, where a TN model can be viewed as a ‘synthetic’ many-body wavefunction, with the number of values obtainable by each random variable setting the dimension of the associated local spin. In this picture, a continuous random variable would necessitate infinite-dimensional local spins, which have received scant attention in the many-body TN community.”
Has been rephrased to:
“This restriction can be best understood within the BM formalism, which is often thought by physicists as describing many-body wavefunctions. In this context, a TN model can be viewed as a ‘synthetic’ many-body wavefunction, with the number of possible values of each random variable setting the dimension of the associated local spin. Because the BM formalism is primarily used in many-body quantum physics, where a TN describes a discrete ‘orbital’ or site space, it seems natural from that standpoint to use them exclusively for discrete variables. Continuous random variables would necessitate infinite-dimensional local spins, which have received less attention in the many-body TN community.”
3. Extended the Last Paragraph of Section III A:
The original text:
“When training an MPS for a given task (e.g. estimating ground state energies, classifying input data, learning probability distributions, etc.), the tensor cores of the MPS can be optimized in several different ways, including gradient descent and density matrix renormalization group (DMRG). We utilize gradient descent in the following for simplicity, but mention that the use of DMRG allows for dynamic control over the bond dimensions of the MPS.”
Has been extended to:
“There are typically two different optimization and update strategies. One approach involves updating all tensors incrementally using gradient-based algorithms, as is commonly employed to train neural networks in machine learning settings. The other approach targets one site or two adjacent sites, optimizing them fully before moving to the next target. This method involves interactively sweeping and targeting tensors from left to right and then right to left, inspired by DMRG sweeps used in calculating ground states. At each step for a given target, we use gradient descent methods to update the bond tensors until convergence, thereby avoiding the frequent recalculation of environment tensor contractions.
Similar to DMRG schemes, we can target one or two adjacent sites for optimization. In the one-site update approach, the bond dimension is fixed and predetermined. For the two-site update, the two tensors are contracted to form a bond tensor, which is then optimized via gradient-based methods until convergence. The bond tensor can then be factorized back into two adjacent tensors, with the dimension of the newly factorized bond dynamically adjusted based on the singular value spectra occurring in the decomposition. We will refer to this approach as the DMRG two-site scheme in the following discussion. However, unlike traditional DMRG methods for ground state problems, this approach will not involve solving an eigenvalue problem.”
4. Adopted More Specific Terminology for the DMRG Approach:
Throughout the paper, we have used more precise terms such as “DMRG update and sweep schemes” when referring to the DMRG approach. Examples include:
- In the last paragraph of Section I:
“… which in turn allows the use of DMRG [update and sweep schemes] and perfect sampling algorithms within this new setting.”
- In the first paragraph of Section IV:
“As a concrete example, training using two-site DMRG [update scheme] leads to a memory cost of …”
- In the last sentence of Section VIII:
“Along similar lines, we anticipate generalizations of two-site DMRG [update scheme] that permit the dynamic variation of both D and χ to be a useful aid for optimizing continuous-valued TN models.”
5. Cited suggested 4 more papers in Section 2.C related work.
Current status:
Editorial decision:
For Journal SciPost Physics: Publish
(status: Editorial decision fixed and (if required) accepted by authors)
Anonymous on 2024-09-03 [id 4734]
Manuscript lacks DOIs, see https://scipost.org/SciPostPhys/authoring#manuprep .
Anonymous on 2024-09-08 [id 4747]
(in reply to Anonymous Comment on 2024-09-03 [id 4734])Thank you for pointing this out. We'll fix it in the next version.