In this dissertation, we explore the impact of geometry and topology on the capabilities of deep learning models. Learning theory requires fitting a model to data while generalizing to unseen data as accurately possible. Naturally, properties of the underlying data distribution affect the ability of models to learn the risk-minimizing function. In addition, the sampling size and number of parameters required are quantities that can be improved by leveraging geometric and topological information.
In Chapter 2, we show that under certain assumptions on a network activation function, sets of networks with fixed architecture are not closed in Sobolev spaces. This means that we can often approximate some functions and their derivatives to arbitrary accuracy, even if that function cannot be realized exactly by a neural network. However, doing so requires parameters to explode, which provides further insight on the approximation capabilities of neural networks.
Chapter 3 analyzes the generalization benefits of data augmentation. Though data augmentation is widely believed to improve model generalization, we establish novel results with provably tighter bounds on generalization error under algorithmic and distributional assumptions. In particular, algorithms with strong stability criteria have improved generalization under data augmentation. Moreover, invariance properties of the data distribution can ensure that we learn the risk minimizing function with better generalization.
We turn to semi-supervised learning in Chapter 4, where we demonstrate how autoencoders applied to separate charts on manifolds can decrease the model complexity required for reconstruction. Under assumptions on the geometry and topology of a manifold, we characterize how many charts are needed for encoding via linear projections. We also show that this approach has a mild impact on the decoder complexity, which depends only weakly on the ambient data dimension.
Finally, Chapter 5 studies point cloud classification via a linear optimal transport embedding. We provide sufficient conditions for being able to nearly isometrically embed distributions into Euclidean space via input-convex neural networks trained on optimal transport maps. We can then linearly separate classes based on point cloud data sampled from the target distributions. Again, we leverage the underlying geometry of the data to improve model capabilities.