The Wasserstein distance has become increasingly important in machine learning and deep learning. It measures the distance of two probability distributions on a given metric space. In practice, computing the Wasserstein distance between two discrete measures requires solving a linear programming problem. This problem has been well studied recently and can be solved efficiently via the well known Sinkhorn’s algorithm. This thesis studies several variants of the standard Wasserstein distance, which are formulated as minimax problems. Because of the nontrivial linear constraints in the Wasserstein distance, standard minimax solvers, e.g. the Gradient Descent Ascent algorithm, cannot be easily implemented. Therefore, we focuses on designing new algorithms for solving minimax problems in optimal transport efficiently and deriving their convergence analysis.In Chapter 2, we study the Projection Robust Wasserstein (PRW) distance, which was proposed to alleviate the curse of dimensionality of optimal transport. The PRW distance projects the sampled data from the high dimensional probability distribution onto a lower-dimensional subspace, and then computes the Wasserstein distance between the projected data. However, this approach requires to solve a max-min problem over the Stiefel manifold, which is very challenging in practice. In Chapter 2, we propose a Riemannian block coordinate descent (RBCD) method to solve this problem. The proposed method is based on a novel reformulation of the regularized max-min problem over the Stiefel manifold. We show that the complexity of arithmetic operations for RBCD to obtain an ε-stationary point is O(ε−3). This significantly improves the corresponding complexity of the state-of-art, the RGAS (Riemannian Gradient Ascent with Sinkhorn Iteration) algorithm, which is O(ε−12). Moreover, our RBCD has very low per-iteration complexity, and hence is suitable for large-scale problems. Numerical results on both synthetic and real datasets demonstrate that our method is more efficient than existing methods, especially when the number of sampled data is very large.In Chapter 3, we extend the ideas of the PRW distance to the Wasserstein Barycenter problem. The Wasserstein Barycenter is a useful tool for collecting and aggregating information from several probability measures or histograms, a fundamental task in machine learning. However, approximating the Wasserstein barycenter is numerically challenging because of the curse of dimensionality. Chapter 3 proposes the projection robust Wasserstein barycenter (PRWB) that has the potential to mitigate the curse of dimensionality. Since PRWB is numerically very challenging to solve, we further propose a relaxed PRWB (RPRWB) model, which is more tractable. The RPRWB projects the probability measures onto a lower-dimensional subspace that maximizes the Wasserstein barycenter objective. The resulting problem is a max-min problem over the Stiefel manifold. By combining the iterative Bregman projection algorithm and Riemannian optimization, we propose two new algorithms for computing the RPRWB. The complexity of arithmetic operations of the proposed algorithms for obtaining an ε-stationary solution is analyzed. We further incorporate the RPRWB into a discrete distribution clustering algorithm, and the numerical results on real text datasets confirm that our RPRWB model helps improve the clustering performance significantly.In Chapter 4, we study another interesting minimax problem in optimal transport, the Equitable and Optimal Transport problem (EOT). The EOT problem has many applications such as fair division problems and optimal transport with multiple agents etc. To solve large scale EOT problems, Scetbon et al. suggested to perturb the problem by adding an entropy regularization and proposed a projected alternating maximization algorithm (PAM) to solve the dual of the entropy regularized EOT. In Chapter 4, we provide the first convergence analysis of PAM. A novel rounding procedure is proposed to help construct the primal solution for the original EOT problem. We also propose a variant of PAM by incorporating the extrapolation technique that can numerically improve the performance of PAM. Results in this Chapter may shed lights on block coordinate (gradient) descent methods for general optimization problems.Finally, we summarize our main contributions in Chapter 5.